Re: Spark Streaming threading model

Tathagata Das Wed, 25 Sep 2013 15:02:18 -0700

On Wed, Sep 25, 2013 at 12:30 PM, Gerard Maas <[email protected]> wrote:


> Hi Tathagata,
>
> Many thanks for the extended answer and the clarifications on the kafka
> data distribution in the cluster.
>
> There are many points to handle, so, to start somewhere:
>
> Case (ii) could have been implemented as an actor as it just inserts a
> >
> > record on an arraybuffer (i.e.m very small task). However, with rates of
> > more than 100K records received per second, I was unsure what the
> overhead
> > of sending each record as a message through the actor library would be
> > like.
> >
> > I'm personally curious about this point. I could investigate by creating
> a
> simplified test scenario that isolates the data cummulator case and compare
> the performance of both models (actors vs threads with proper locking)
> under different levels of concurrency.
> Do you think this could be helpful for the project? I'm looking to
> contribute and this could be an interesting starting point.
>
>
Yes! actor vs threads with locking is a great test to do, since for the
kafka (and who know what other sources in future), the block generator has
to support multiple thread ingestion. I think one also needs to compare
with single thread without locking (the current model). If single thread
without locking is the fastest and thread with locking not so bad compared
to actors, then it may be better to leave the ingestion without locks for
maximum throughput for single-thread sources (e.g. Socket, and most others)
and add a lock for multi-thread sources like Kafka.



> >>I probably went into more detail that you wanted to know. :)
> Absolutely not. The more, the better :-)
>
> -kr, Gerard.
>

Re: Spark Streaming threading model

Reply via email to