Re: Structured Streaming Questions

2017-06-28 Thread Tathagata Das
Answers inline. On Wed, Jun 28, 2017 at 10:27 AM, Revin Chalil wrote: > I am using Structured Streaming with Spark 2.1 and have some basic > questions. > > > > · Is there a way to automatically refresh the Hive Partitions > when using Parquet Sink with Partition?

Structured Streaming Questions

2017-06-28 Thread Revin Chalil
I am using Structured Streaming with Spark 2.1 and have some basic questions. * Is there a way to automatically refresh the Hive Partitions when using Parquet Sink with Partition? My query looks like below val queryCount = windowedCount

Spark Streaming questions, just 2

2017-03-21 Thread shyla deshpande
Hello all, I have a couple of spark streaming questions. Thanks. 1. In the case of stateful operations, the data is, by default, persistent in memory. In memory does it mean MEMORY_ONLY? When is it removed from memory? 2. I do not see any documentation for spark.cleaner.ttl

Re: spark streaming questions

2016-06-22 Thread pandees waran
dehmich.wordpress.com >> >> >>> On 22 June 2016 at 15:54, pandees waran <pande...@gmail.com> wrote: >>> Hi Mich, please let me know if you have any thoughts on the below. >>> >>> -- Forwarded message -- >>> From: pa

Re: spark streaming questions

2016-06-22 Thread pandees waran
andees waran <pande...@gmail.com> wrote: >> Hi Mich, please let me know if you have any thoughts on the below. >> >> -- Forwarded message -- >> From: pandees waran <pande...@gmail.com> >> Date: Wed, Jun 22, 2016 at 7:53 AM >> Subj

Re: spark streaming questions

2016-06-22 Thread Mich Talebzadeh
om> wrote: > Hi Mich, please let me know if you have any thoughts on the below. > > -- Forwarded message -- > From: pandees waran <pande...@gmail.com> > Date: Wed, Jun 22, 2016 at 7:53 AM > Subject: spark streaming questions > To: user@spark.apache.org &g

spark streaming questions

2016-06-22 Thread pandees waran
Hello all, I have few questions regarding spark streaming : * I am wondering anyone uses spark streaming with workflow orchestrators such as data pipeline/SWF/any other framework. Is there any advantages /drawbacks on using a workflow orchestrator for spark streaming? *How do you guys manage

Re: [SPARK STREAMING] Questions regarding foreachPartition

2015-11-17 Thread Nipun Arora
Thanks Cody, that's what I thought. Currently in the cases where I want global ordering, I am doing a collect() call and going through everything in the client. I wonder if there is a way to do a global ordered execution across micro-batches in a betterway? I am having some trouble with

[SPARK STREAMING] Questions regarding foreachPartition

2015-11-16 Thread Nipun Arora
Hi, I wanted to understand forEachPartition logic. In the code below, I am assuming the iterator is executing in a distributed fashion. 1. Assuming I have a stream which has timestamp data which is sorted. Will the stringiterator in foreachPartition process each line in order? 2. Assuming I have

Re: [SPARK STREAMING] Questions regarding foreachPartition

2015-11-16 Thread Cody Koeninger
Ordering would be on a per-partition basis, not global ordering. You typically want to acquire resources inside the foreachpartition closure, just before handling the iterator. http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd On Mon, Nov

2 spark streaming questions

2014-11-23 Thread tian zhang
Hi, Dear Spark Streaming Developers and Users, We are prototyping using spark streaming and hit the following 2 issues thatI would like to seek your expertise. 1) We have a spark streaming application in scala, that reads  data from Kafka intoa DStream, does some processing and output a

Re: streaming questions

2014-07-02 Thread mcampbell
-user-list.1001560.n3.nabble.com/streaming-questions-tp3281p8655.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: spark streaming questions

2014-06-25 Thread Chen Song
Thanks Anwar. On Tue, Jun 17, 2014 at 11:54 AM, Anwar Rizal anriza...@gmail.com wrote: On Tue, Jun 17, 2014 at 5:39 PM, Chen Song chen.song...@gmail.com wrote: Hey I am new to spark streaming and apologize if these questions have been asked. * In StreamingContext, reduceByKey() seems

spark streaming questions

2014-06-17 Thread Chen Song
Hey I am new to spark streaming and apologize if these questions have been asked. * In StreamingContext, reduceByKey() seems to only work on the RDDs of the current batch interval, not including RDDs of previous batches. Is my understanding correct? * If the above statement is correct, what

Fwd: spark streaming questions

2014-06-16 Thread Chen Song
Hey I am new to spark streaming and apologize if these questions have been asked. * In StreamingContext, reduceByKey() seems to only work on the RDDs of the current batch interval, not including RDDs of previous batches. Is my understanding correct? * If the above statement is correct, what

streaming questions

2014-03-26 Thread Diana Carroll
I'm trying to understand Spark streaming, hoping someone can help. I've kinda-sorta got a version of Word Count running, and it looks like this: import org.apache.spark.streaming.{Seconds, StreamingContext} import org.apache.spark.streaming.StreamingContext._ object StreamingWordCount { def

Re: streaming questions

2014-03-26 Thread Tathagata Das
*Answer 1:*Make sure you are using master as local[n] with n 1 (assuming you are running it in local mode). The way Spark Streaming works is that it assigns a code to the data receiver, and so if you run the program with only one core (i.e., with local or local[1]), then it wont have resources to

RE: streaming questions

2014-03-26 Thread Adrian Mocanu
empty RDD rdd.fold(0)(_ + _) // no problem with empty RDD A From: Diana Carroll [mailto:dcarr...@cloudera.com] Sent: March-26-14 2:09 PM To: user Subject: streaming questions I'm trying to understand Spark streaming, hoping someone can help. I've kinda-sorta got a version of Word Count running

Re: streaming questions

2014-03-26 Thread Diana Carroll
Thanks, Tagatha, very helpful. A follow-up question below... On Wed, Mar 26, 2014 at 2:27 PM, Tathagata Das tathagata.das1...@gmail.comwrote: *Answer 3:*You can do something like wordCounts.foreachRDD((rdd: RDD[...], time: Time) = { if (rdd.take(1).size == 1) { // There exists