Re: Serial batching with Spark Streaming

Tathagata Das Wed, 17 Jun 2015 14:26:11 -0700

The default behavior should be that batch X + 1 starts processing only
after batch X completes. If you are using Spark 1.4.0, could you show us a
screenshot of the streaming tab, especially the list of batches? And could
you also tell us if you are setting any SparkConf configurations?


On Wed, Jun 17, 2015 at 12:22 PM, Michal Čizmazia <mici...@gmail.com> wrote:

> Is it possible to achieve serial batching with Spark Streaming?
>
> Example:
>
> I configure the Streaming Context for creating a batch every 3 seconds.
>
> Processing of the batch #2 takes longer than 3 seconds and creates a
> backlog of batches:
>
> batch #1 takes 2s
> batch #2 takes 10s
> batch #3 takes 2s
> batch #4 takes 2s
>
> Whet testing locally, it seems that processing of multiple batches is
> finished at the same time:
>
> batch #1 finished at 2s
> batch #2 finished at 12s
> batch #3 finished at 12s (processed in parallel)
> batch #4 finished at 15s
>
> How can I delay processing of the next batch, so that is processed
> only after processing of the previous batch has been completed?
>
> batch #1 finished at 2s
> batch #2 finished at 12s
> batch #3 finished at 14s (processed serially)
> batch #4 finished at 16s
>
> I want to perform a transformation for every key only once in a given
> period of time (e.g. batch duration). I find all unique keys in a
> batch and perform the transformation on each key. To ensure that the
> transformation is done for every key only once, only one batch can be
> processed at a time. At the same time, I want that single batch to be
> processed in parallel.
>
> context = new JavaStreamingContext(conf, Durations.seconds(10));
> stream = context.receiverStream(...);
> stream
>     .reduceByKey(...)
>     .transform(...)
>     .foreachRDD(output);
>
> Any ideas or pointers are very welcome.
>
> Thanks!
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Serial batching with Spark Streaming

Reply via email to