Hi Patrick,

Last few days I came across some bugs which got exposed due to ALS runs on
large scale data...although it was not related to the akka changes but
during the debug I found across some akka related changes that might have
an impact of overall performance...one example is the following:

https://github.com/apache/spark/pull/1907

@dbtsai explained it to me a bit yesterday that in 1.1 RDDs are no longer
sent through akka msgs but over http-channels...If there is a document
detailing the architecture that is currently in-place (like how the core
changed from 1.0 to 1.1) it will help a lot in debugging the jobs which are
built upon the libraries like mllib and optimize them further for
efficiency...

For using the Spark actor system directly:

I spent few weeks December 2013 to make the Scalafish code (
https://github.com/azymnis/scalafish) operational on 10 nodes...It uses
scalding for matrix partitioning and actorSystem to coordinate the
updates...It is a cool use of akka but getting an actor system operational
is difficult...

Since Spark already has tested version of actor system running on both
standalone and yarn modes, I am planning to port scalafish to spark using
actor model...That's one of the use-cases I am looking for...

Another use-case that I am considering is to send msgs directly from kafka
queues to spark actorSystem for processing to get Storm like
latency...basically window sizes of 1-2 ms and no overhead of using an RDD
if possible...

Thanks.
Deb


On Wed, Aug 20, 2014 at 1:42 PM, Patrick Wendell <pwend...@gmail.com> wrote:

> Hey Deb,
>
> Can you be specific what changes you are mentioning? We have not, to my
> knowledge, made major architectural changes around akka use.
>
> I think in general we don't want people to be using Spark's actor system
> directly - it is an internal communication component in Spark and could
> e.g. be re-factored later to not use akka at all. Could you elaborate a bit
> more on your use case?
>
> - Patrick
>
>
> On Wed, Aug 20, 2014 at 9:02 AM, Debasish Das <debasish.da...@gmail.com>
> wrote:
>
>> Hi,
>>
>> There have been some recent changes in the way akka is used in spark and I
>> feel they are major changes...
>>
>> Is there a design document / JIRA / experiment on large datasets that
>> highlight the impact of changes (1.0 vs 1.1) ? Basically it will be great
>> to understand where akka is used in the code base...
>>
>> If I don't have to broadcast big variables but use akka's programming
>> model
>> (use actors directly) on Spark's actorsystem is that allowed ? I
>> understand
>> that it might look hacky :-)
>>
>> Thanks.
>> Deb
>>
>
>

Reply via email to