Flume DStream produces 0 records after HDFS node killed

2017-06-19 Thread N B
Hi all, We are running a Standalone Spark Cluster for running a streaming application. The application consumes data from Flume using a Flume Polling stream created as such : flumeStream = FlumeUtils.createPollingStream(streamingContext, socketAddress.toArray(new

Re: How save streaming aggregations on 'Structured Streams' in parquet format ?

2017-06-19 Thread Tathagata Das
That is not the write way to use watermark + append output mode. The `withWatermark` must be before the aggregation. Something like this. df.withWatermark("timestamp", "1 hour") .groupBy(window("timestamp", "30 seconds")) .agg(...) Read more here -

Spark Streaming - Increasing number of executors slows down processing rate

2017-06-19 Thread Mal Edwin
Hi All, I am struggling with an odd issue and would like your help in addressing it. Environment AWS Cluster (40 Spark Nodes & 4 node Kafka cluster) Spark Kafka Streaming submitted in Yarn cluster mode Kafka - Single topic, 400 partitions Spark 2.1 on Cloudera Kafka 10.0 on Cloudera We have zero

Re: the scheme in stream reader

2017-06-19 Thread ??????????
Hi , I have submitted a JIRA for this issue. The link is https://issues.apache.org/jira/browse/SPARK-21147 thanks Fei Shao ---Original--- From: "Michael Armbrust" Date: 2017/6/20 03:06:49 To: "??"<1427357...@qq.com>; Cc:

Unsubscribe

2017-06-19 Thread praba karan
Unsubscribe Sent from Yahoo Mail on Android

the meaning of partition column and bucket column please?

2017-06-19 Thread ??????????
Hi all, The code of Column has member named isPartition and isBucket. What is the meanibg of them please? And when should set them as true please? Thank you advanced. Fei Shao

Re: How save streaming aggregations on 'Structured Streams' in parquet format ?

2017-06-19 Thread kaniska Mandal
Hi Burak, Per your suggestion, I have specified > deviceBasicAgg.withWatermark("eventtime", "30 seconds"); before invoking deviceBasicAgg.writeStream()... But I am still facing ~ org.apache.spark.sql.AnalysisException: Append output mode not supported when there are streaming aggregations on

Re: How save streaming aggregations on 'Structured Streams' in parquet format ?

2017-06-19 Thread Felix Cheung
And perhaps the error message can be improved here? From: Tathagata Das Sent: Monday, June 19, 2017 8:24:01 PM To: kaniska Mandal Cc: Burak Yavuz; user Subject: Re: How save streaming aggregations on 'Structured Streams' in parquet

Do we anything for Deep Learning in Spark?

2017-06-19 Thread Gaurav1809
Hi All, Similar to how we have machine learning library called ML, do we have anything for deep learning? If yes, please share the details. If not then what should be the approach? Thanks and regards, Gaurav Pandya -- View this message in context:

Why my project has this kind of error ?

2017-06-19 Thread 张明磊
Hello to all, Below is my issue. I have already build again and reimport my project in IntelliJIDEA, but it still gives me this kind of error. But I can build without error by Maven. Just the IDEA gives me this error. Is there anyone know what happened with this ? Thanks Minglei

Re: How save streaming aggregations on 'Structured Streams' in parquet format ?

2017-06-19 Thread kaniska Mandal
Thanks Tathagata for the pointer. On Mon, Jun 19, 2017 at 8:24 PM, Tathagata Das wrote: > That is not the write way to use watermark + append output mode. The > `withWatermark` must be before the aggregation. Something like this. > > df.withWatermark("timestamp", "1

Spark streaming data loss

2017-06-19 Thread vasanth kumar
Hi, I have spark kafka streaming job running in Yarn cluster mode with spark.task.maxFailures=4 (default) spark.yarn.max.executor.failures=8 number of executor=1 spark.streaming.stopGracefullyOnShutdown=false checkpointing enabled - When there is RuntimeException in a batch in executor then

Does spark support hive table(parquet) column renaming?

2017-06-19 Thread 李斌松
Does spark support hive table(parquet) column renaming?

Stream Processing: how to refresh a loaded dataset periodically

2017-06-19 Thread aravias
Hi, we are using structured streaming for stream processing and for each message to do some data enrichment i have to lookup data from cassandra and that data in cassandra gets periodically (like once in a day) updated. I want to look at the option of loading it as a dataset and then register it

Re: Spark-Kafka integration - build failing with sbt

2017-06-19 Thread karan alang
Hi Cody - i do have a additional basic question .. When i tried to compile the code in Eclipse, i was not able to do that eg. import org.apache.spark.streaming.kafka.KafkaUtils gave errors saying KafaUtils was not part of the package. However, when i used sbt to compile - the compilation went

How save streaming aggregations on 'Structured Streams' in parquet format ?

2017-06-19 Thread kaniska Mandal
Hi, My goal is to ~ (1) either chain streaming aggregations in a single query OR (2) run multiple streaming aggregations and save data in some meaningful format to execute low latency / failsafe OLAP queries So my first choice is parquet format , but I failed to make it work ! I am using

Re: how many topics spark streaming can handle

2017-06-19 Thread Bryan Jeffrey
Hello Ashok, We're consuming from more than 10 topics in some Spark streaming applications. Topic management is a concern (what is read from where, etc), but I have seen no issues from Spark itself. Regards, Bryan Jeffrey Get Outlook for Android On Mon, Jun 19, 2017 at

Re: Spark-Kafka integration - build failing with sbt

2017-06-19 Thread Cody Koeninger
org.apache.spark.streaming.kafka.KafkaUtils is in the spark-streaming-kafka-0-8 project On Mon, Jun 19, 2017 at 1:01 PM, karan alang wrote: > Hi Cody - i do have a additional basic question .. > > When i tried to compile the code in Eclipse, i was not able to do that > >

how many topics spark streaming can handle

2017-06-19 Thread Ashok Kumar
Hi Gurus, Within one Spark streaming process how many topics can be handled? I have not tried more than one topic. Thanks

Re: how many topics spark streaming can handle

2017-06-19 Thread Michael Armbrust
I don't think that there is really a Spark specific limit here. It would be a function of the size of your spark / kafka clusters and the type of processing you are trying to do. On Mon, Jun 19, 2017 at 12:00 PM, Ashok Kumar wrote: > Hi Gurus, > > Within one Spark

Re: How save streaming aggregations on 'Structured Streams' in parquet format ?

2017-06-19 Thread Burak Yavuz
Hi Kaniska, In order to use append mode with aggregations, you need to set an event time watermark (using `withWatermark`). Otherwise, Spark doesn't know when to output an aggregation result as "final". Best, Burak On Mon, Jun 19, 2017 at 11:03 AM, kaniska Mandal

Re: the scheme in stream reader

2017-06-19 Thread Michael Armbrust
The socket source can't know how to parse your data. I think the right thing would be for it to throw an exception saying that you can't set the schema here. Would you mind opening a JIRA ticket? If you are trying to parse data from something like JSON then you should use from_json` on the

Re: how many topics spark streaming can handle

2017-06-19 Thread Ashok Kumar
thank you in the following example    val topics = "test1,test2,test3"     val brokers = "localhost:9092"     val topicsSet = topics.split(",").toSet     val sparkConf = new SparkConf().setAppName("KafkaDroneCalc").setMaster("local") //spark://localhost:7077     val sc = new

spark submit with logs and kerberos

2017-06-19 Thread Juan Pablo Briganti
Hi! I have a question about logs and have not seen the answer through internet. I have a spark submit process and I configure a custom log configuration to it using the next params: --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=customlog4j.properties" --driver-java-options

Re: Error while doing mvn release for spark 2.0.2 using scala 2.10

2017-06-19 Thread Shixiong(Ryan) Zhu
Some of projects (such as spark-tags) are Java projects. Spark doesn't fix the artifact name and just hard-core 2.11. For your issue, try to use `install` rather than `package`. On Sat, Jun 17, 2017 at 7:20 PM, Kanagha Kumar wrote: > Hi, > > Bumping up again! Why does

Re: Error while doing mvn release for spark 2.0.2 using scala 2.10

2017-06-19 Thread Kanagha Kumar
Thanks. But, I am required to do a maven release to Nexus on spark 2.0.2 built against scala 2.10. How can I go about with this? Is this a bug that I need to open in Spark jira? On Mon, Jun 19, 2017 at 12:12 PM, Shixiong(Ryan) Zhu < shixi...@databricks.com> wrote: > Some of projects (such as

Merging multiple Pandas dataframes

2017-06-19 Thread saatvikshah1994
Hi, I am iteratively receiving a file which can only be opened as a Pandas dataframe. For the first such file I receive, I am converting this to a Spark dataframe using the 'createDataframe' utility function. The next file onward, I am converting it and union'ing it into the first Spark