The stream receiver seems to leverage actor receivers http://spark.apache.org/docs/0.8.1/streaming-custom-receivers.html But spark system doesnt lend itself to a messaging kind of a structure.. more of a DAG kind Just curious are you looking for the actor subsystem to act on messages or just looking to use them as a local/distributed message bus
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Thu, Aug 21, 2014 at 4:04 AM, Debasish Das <debasish.da...@gmail.com> wrote: > Yeah that's the one we discussed...sorry I pointed to a different one that > I was reading... > > > On Wed, Aug 20, 2014 at 3:28 PM, DB Tsai <dbt...@dbtsai.com> wrote: > > > To be specific, I was discussing this PR with Debasish which reduces > > lots of issues when sending big objects to executors without using > > broadcast explicitly. > > > > Broadcast RDD object once per TaskSet (instead of sending it for every > > task) > > https://issues.apache.org/jira/browse/SPARK-2521 > > > > Sincerely, > > > > DB Tsai > > ------------------------------------------------------- > > My Blog: https://www.dbtsai.com > > LinkedIn: https://www.linkedin.com/in/dbtsai > > > > > > On Wed, Aug 20, 2014 at 3:19 PM, Debasish Das <debasish.da...@gmail.com> > > wrote: > > > Hi Patrick, > > > > > > Last few days I came across some bugs which got exposed due to ALS runs > > on > > > large scale data...although it was not related to the akka changes but > > > during the debug I found across some akka related changes that might > have > > > an impact of overall performance...one example is the following: > > > > > > https://github.com/apache/spark/pull/1907 > > > > > > @dbtsai explained it to me a bit yesterday that in 1.1 RDDs are no > longer > > > sent through akka msgs but over http-channels...If there is a document > > > detailing the architecture that is currently in-place (like how the > core > > > changed from 1.0 to 1.1) it will help a lot in debugging the jobs which > > are > > > built upon the libraries like mllib and optimize them further for > > > efficiency... > > > > > > For using the Spark actor system directly: > > > > > > I spent few weeks December 2013 to make the Scalafish code ( > > > https://github.com/azymnis/scalafish) operational on 10 nodes...It > uses > > > scalding for matrix partitioning and actorSystem to coordinate the > > > updates...It is a cool use of akka but getting an actor system > > operational > > > is difficult... > > > > > > Since Spark already has tested version of actor system running on both > > > standalone and yarn modes, I am planning to port scalafish to spark > using > > > actor model...That's one of the use-cases I am looking for... > > > > > > Another use-case that I am considering is to send msgs directly from > > kafka > > > queues to spark actorSystem for processing to get Storm like > > > latency...basically window sizes of 1-2 ms and no overhead of using an > > RDD > > > if possible... > > > > > > Thanks. > > > Deb > > > > > > > > > On Wed, Aug 20, 2014 at 1:42 PM, Patrick Wendell <pwend...@gmail.com> > > wrote: > > > > > >> Hey Deb, > > >> > > >> Can you be specific what changes you are mentioning? We have not, to > my > > >> knowledge, made major architectural changes around akka use. > > >> > > >> I think in general we don't want people to be using Spark's actor > system > > >> directly - it is an internal communication component in Spark and > could > > >> e.g. be re-factored later to not use akka at all. Could you elaborate > a > > bit > > >> more on your use case? > > >> > > >> - Patrick > > >> > > >> > > >> On Wed, Aug 20, 2014 at 9:02 AM, Debasish Das < > debasish.da...@gmail.com > > > > > >> wrote: > > >> > > >>> Hi, > > >>> > > >>> There have been some recent changes in the way akka is used in spark > > and I > > >>> feel they are major changes... > > >>> > > >>> Is there a design document / JIRA / experiment on large datasets that > > >>> highlight the impact of changes (1.0 vs 1.1) ? Basically it will be > > great > > >>> to understand where akka is used in the code base... > > >>> > > >>> If I don't have to broadcast big variables but use akka's programming > > >>> model > > >>> (use actors directly) on Spark's actorsystem is that allowed ? I > > >>> understand > > >>> that it might look hacky :-) > > >>> > > >>> Thanks. > > >>> Deb > > >>> > > >> > > >> > > >