There is a BlockGenerator on each worker node next to the
ReceiverSupervisorImpl, which generates Blocks out of an ArrayBuffer in
each interval (block_interval). These Blocks are passed to
ReceiverSupervisorImpl, which throws these blocks to into the BlockManager
for storage. BlockInfos are passed to the driver. Mini-batches are created
by the JobGenerator component on the driver each batch_interval. I guess
what you are looking for is provided by a continuous model like Flink's. We
are creating mini-batches to provide fault tolerance.

Zvara Zoltán



mail, hangout, skype: zoltan.zv...@gmail.com

mobile, viber: +36203129543

bank: 10918001-00000021-50480008

address: Hungary, 2475 Kápolnásnyék, Kossuth 6/a

elte: HSKSJZ (ZVZOAAI.ELTE)

2015-03-24 11:55 GMT+01:00 Arush Kharbanda <ar...@sigmoidanalytics.com>:

> The block size is configurable and that way I think you can reduce the
> block interval, to keep the block in memory only for the limiter interval?
> Is that what you are looking for?
>
> On Tue, Mar 24, 2015 at 1:38 PM, Bin Wang <wbi...@gmail.com> wrote:
>
> > Hi,
> >
> > I'm learning Spark and I find there could be some optimize for the
> current
> > streaming implementation. Correct me if I'm wrong.
> >
> > The current streaming implementation put the data of one batch into
> memory
> > (as RDD). But it seems not necessary.
> >
> > For example, if I want to count the lines which contains word "Spark", I
> > just need to map every line to see if it contains word, then reduce it
> with
> > a sum function. After that, this line is no longer useful to keep it in
> > memory.
> >
> > That is said, if the DStream only have one map and/or reduce operation on
> > it. It is not necessary to keep all the batch data in the memory.
> Something
> > like a pipeline should be OK.
> >
> > Is it difficult to implement on top of the current implementation?
> >
> > Thanks.
> >
> > ---
> > Bin Wang
> >
>
>
>
> --
>
> [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>
>
> *Arush Kharbanda* || Technical Teamlead
>
> ar...@sigmoidanalytics.com || www.sigmoidanalytics.com
>

Reply via email to