The default behavior should be that batch X + 1 starts processing only after batch X completes. If you are using Spark 1.4.0, could you show us a screenshot of the streaming tab, especially the list of batches? And could you also tell us if you are setting any SparkConf configurations?
On Wed, Jun 17, 2015 at 12:22 PM, Michal Čizmazia <mici...@gmail.com> wrote: > Is it possible to achieve serial batching with Spark Streaming? > > Example: > > I configure the Streaming Context for creating a batch every 3 seconds. > > Processing of the batch #2 takes longer than 3 seconds and creates a > backlog of batches: > > batch #1 takes 2s > batch #2 takes 10s > batch #3 takes 2s > batch #4 takes 2s > > Whet testing locally, it seems that processing of multiple batches is > finished at the same time: > > batch #1 finished at 2s > batch #2 finished at 12s > batch #3 finished at 12s (processed in parallel) > batch #4 finished at 15s > > How can I delay processing of the next batch, so that is processed > only after processing of the previous batch has been completed? > > batch #1 finished at 2s > batch #2 finished at 12s > batch #3 finished at 14s (processed serially) > batch #4 finished at 16s > > I want to perform a transformation for every key only once in a given > period of time (e.g. batch duration). I find all unique keys in a > batch and perform the transformation on each key. To ensure that the > transformation is done for every key only once, only one batch can be > processed at a time. At the same time, I want that single batch to be > processed in parallel. > > context = new JavaStreamingContext(conf, Durations.seconds(10)); > stream = context.receiverStream(...); > stream > .reduceByKey(...) > .transform(...) > .foreachRDD(output); > > Any ideas or pointers are very welcome. > > Thanks! > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >