There is a BlockGenerator on each worker node next to the ReceiverSupervisorImpl, which generates Blocks out of an ArrayBuffer in each interval (block_interval). These Blocks are passed to ReceiverSupervisorImpl, which throws these blocks to into the BlockManager for storage. BlockInfos are passed to the driver. Mini-batches are created by the JobGenerator component on the driver each batch_interval. I guess what you are looking for is provided by a continuous model like Flink's. We are creating mini-batches to provide fault tolerance.
Zvara Zoltán mail, hangout, skype: zoltan.zv...@gmail.com mobile, viber: +36203129543 bank: 10918001-00000021-50480008 address: Hungary, 2475 Kápolnásnyék, Kossuth 6/a elte: HSKSJZ (ZVZOAAI.ELTE) 2015-03-24 11:55 GMT+01:00 Arush Kharbanda <ar...@sigmoidanalytics.com>: > The block size is configurable and that way I think you can reduce the > block interval, to keep the block in memory only for the limiter interval? > Is that what you are looking for? > > On Tue, Mar 24, 2015 at 1:38 PM, Bin Wang <wbi...@gmail.com> wrote: > > > Hi, > > > > I'm learning Spark and I find there could be some optimize for the > current > > streaming implementation. Correct me if I'm wrong. > > > > The current streaming implementation put the data of one batch into > memory > > (as RDD). But it seems not necessary. > > > > For example, if I want to count the lines which contains word "Spark", I > > just need to map every line to see if it contains word, then reduce it > with > > a sum function. After that, this line is no longer useful to keep it in > > memory. > > > > That is said, if the DStream only have one map and/or reduce operation on > > it. It is not necessary to keep all the batch data in the memory. > Something > > like a pipeline should be OK. > > > > Is it difficult to implement on top of the current implementation? > > > > Thanks. > > > > --- > > Bin Wang > > > > > > -- > > [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com> > > *Arush Kharbanda* || Technical Teamlead > > ar...@sigmoidanalytics.com || www.sigmoidanalytics.com >