Yeah that's the one we discussed...sorry I pointed to a different one that I was reading...
On Wed, Aug 20, 2014 at 3:28 PM, DB Tsai <dbt...@dbtsai.com> wrote: > To be specific, I was discussing this PR with Debasish which reduces > lots of issues when sending big objects to executors without using > broadcast explicitly. > > Broadcast RDD object once per TaskSet (instead of sending it for every > task) > https://issues.apache.org/jira/browse/SPARK-2521 > > Sincerely, > > DB Tsai > ------------------------------------------------------- > My Blog: https://www.dbtsai.com > LinkedIn: https://www.linkedin.com/in/dbtsai > > > On Wed, Aug 20, 2014 at 3:19 PM, Debasish Das <debasish.da...@gmail.com> > wrote: > > Hi Patrick, > > > > Last few days I came across some bugs which got exposed due to ALS runs > on > > large scale data...although it was not related to the akka changes but > > during the debug I found across some akka related changes that might have > > an impact of overall performance...one example is the following: > > > > https://github.com/apache/spark/pull/1907 > > > > @dbtsai explained it to me a bit yesterday that in 1.1 RDDs are no longer > > sent through akka msgs but over http-channels...If there is a document > > detailing the architecture that is currently in-place (like how the core > > changed from 1.0 to 1.1) it will help a lot in debugging the jobs which > are > > built upon the libraries like mllib and optimize them further for > > efficiency... > > > > For using the Spark actor system directly: > > > > I spent few weeks December 2013 to make the Scalafish code ( > > https://github.com/azymnis/scalafish) operational on 10 nodes...It uses > > scalding for matrix partitioning and actorSystem to coordinate the > > updates...It is a cool use of akka but getting an actor system > operational > > is difficult... > > > > Since Spark already has tested version of actor system running on both > > standalone and yarn modes, I am planning to port scalafish to spark using > > actor model...That's one of the use-cases I am looking for... > > > > Another use-case that I am considering is to send msgs directly from > kafka > > queues to spark actorSystem for processing to get Storm like > > latency...basically window sizes of 1-2 ms and no overhead of using an > RDD > > if possible... > > > > Thanks. > > Deb > > > > > > On Wed, Aug 20, 2014 at 1:42 PM, Patrick Wendell <pwend...@gmail.com> > wrote: > > > >> Hey Deb, > >> > >> Can you be specific what changes you are mentioning? We have not, to my > >> knowledge, made major architectural changes around akka use. > >> > >> I think in general we don't want people to be using Spark's actor system > >> directly - it is an internal communication component in Spark and could > >> e.g. be re-factored later to not use akka at all. Could you elaborate a > bit > >> more on your use case? > >> > >> - Patrick > >> > >> > >> On Wed, Aug 20, 2014 at 9:02 AM, Debasish Das <debasish.da...@gmail.com > > > >> wrote: > >> > >>> Hi, > >>> > >>> There have been some recent changes in the way akka is used in spark > and I > >>> feel they are major changes... > >>> > >>> Is there a design document / JIRA / experiment on large datasets that > >>> highlight the impact of changes (1.0 vs 1.1) ? Basically it will be > great > >>> to understand where akka is used in the code base... > >>> > >>> If I don't have to broadcast big variables but use akka's programming > >>> model > >>> (use actors directly) on Spark's actorsystem is that allowed ? I > >>> understand > >>> that it might look hacky :-) > >>> > >>> Thanks. > >>> Deb > >>> > >> > >> >