Hi Patrick, Last few days I came across some bugs which got exposed due to ALS runs on large scale data...although it was not related to the akka changes but during the debug I found across some akka related changes that might have an impact of overall performance...one example is the following:
https://github.com/apache/spark/pull/1907 @dbtsai explained it to me a bit yesterday that in 1.1 RDDs are no longer sent through akka msgs but over http-channels...If there is a document detailing the architecture that is currently in-place (like how the core changed from 1.0 to 1.1) it will help a lot in debugging the jobs which are built upon the libraries like mllib and optimize them further for efficiency... For using the Spark actor system directly: I spent few weeks December 2013 to make the Scalafish code ( https://github.com/azymnis/scalafish) operational on 10 nodes...It uses scalding for matrix partitioning and actorSystem to coordinate the updates...It is a cool use of akka but getting an actor system operational is difficult... Since Spark already has tested version of actor system running on both standalone and yarn modes, I am planning to port scalafish to spark using actor model...That's one of the use-cases I am looking for... Another use-case that I am considering is to send msgs directly from kafka queues to spark actorSystem for processing to get Storm like latency...basically window sizes of 1-2 ms and no overhead of using an RDD if possible... Thanks. Deb On Wed, Aug 20, 2014 at 1:42 PM, Patrick Wendell <pwend...@gmail.com> wrote: > Hey Deb, > > Can you be specific what changes you are mentioning? We have not, to my > knowledge, made major architectural changes around akka use. > > I think in general we don't want people to be using Spark's actor system > directly - it is an internal communication component in Spark and could > e.g. be re-factored later to not use akka at all. Could you elaborate a bit > more on your use case? > > - Patrick > > > On Wed, Aug 20, 2014 at 9:02 AM, Debasish Das <debasish.da...@gmail.com> > wrote: > >> Hi, >> >> There have been some recent changes in the way akka is used in spark and I >> feel they are major changes... >> >> Is there a design document / JIRA / experiment on large datasets that >> highlight the impact of changes (1.0 vs 1.1) ? Basically it will be great >> to understand where akka is used in the code base... >> >> If I don't have to broadcast big variables but use akka's programming >> model >> (use actors directly) on Spark's actorsystem is that allowed ? I >> understand >> that it might look hacky :-) >> >> Thanks. >> Deb >> > >