Yeah that's the one we discussed...sorry I pointed to a different one that
I was reading...


On Wed, Aug 20, 2014 at 3:28 PM, DB Tsai <dbt...@dbtsai.com> wrote:

> To be specific, I was discussing this PR with Debasish which reduces
> lots of issues when sending big objects to executors without using
> broadcast explicitly.
>
> Broadcast RDD object once per TaskSet (instead of sending it for every
> task)
> https://issues.apache.org/jira/browse/SPARK-2521
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
> On Wed, Aug 20, 2014 at 3:19 PM, Debasish Das <debasish.da...@gmail.com>
> wrote:
> > Hi Patrick,
> >
> > Last few days I came across some bugs which got exposed due to ALS runs
> on
> > large scale data...although it was not related to the akka changes but
> > during the debug I found across some akka related changes that might have
> > an impact of overall performance...one example is the following:
> >
> > https://github.com/apache/spark/pull/1907
> >
> > @dbtsai explained it to me a bit yesterday that in 1.1 RDDs are no longer
> > sent through akka msgs but over http-channels...If there is a document
> > detailing the architecture that is currently in-place (like how the core
> > changed from 1.0 to 1.1) it will help a lot in debugging the jobs which
> are
> > built upon the libraries like mllib and optimize them further for
> > efficiency...
> >
> > For using the Spark actor system directly:
> >
> > I spent few weeks December 2013 to make the Scalafish code (
> > https://github.com/azymnis/scalafish) operational on 10 nodes...It uses
> > scalding for matrix partitioning and actorSystem to coordinate the
> > updates...It is a cool use of akka but getting an actor system
> operational
> > is difficult...
> >
> > Since Spark already has tested version of actor system running on both
> > standalone and yarn modes, I am planning to port scalafish to spark using
> > actor model...That's one of the use-cases I am looking for...
> >
> > Another use-case that I am considering is to send msgs directly from
> kafka
> > queues to spark actorSystem for processing to get Storm like
> > latency...basically window sizes of 1-2 ms and no overhead of using an
> RDD
> > if possible...
> >
> > Thanks.
> > Deb
> >
> >
> > On Wed, Aug 20, 2014 at 1:42 PM, Patrick Wendell <pwend...@gmail.com>
> wrote:
> >
> >> Hey Deb,
> >>
> >> Can you be specific what changes you are mentioning? We have not, to my
> >> knowledge, made major architectural changes around akka use.
> >>
> >> I think in general we don't want people to be using Spark's actor system
> >> directly - it is an internal communication component in Spark and could
> >> e.g. be re-factored later to not use akka at all. Could you elaborate a
> bit
> >> more on your use case?
> >>
> >> - Patrick
> >>
> >>
> >> On Wed, Aug 20, 2014 at 9:02 AM, Debasish Das <debasish.da...@gmail.com
> >
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> There have been some recent changes in the way akka is used in spark
> and I
> >>> feel they are major changes...
> >>>
> >>> Is there a design document / JIRA / experiment on large datasets that
> >>> highlight the impact of changes (1.0 vs 1.1) ? Basically it will be
> great
> >>> to understand where akka is used in the code base...
> >>>
> >>> If I don't have to broadcast big variables but use akka's programming
> >>> model
> >>> (use actors directly) on Spark's actorsystem is that allowed ? I
> >>> understand
> >>> that it might look hacky :-)
> >>>
> >>> Thanks.
> >>> Deb
> >>>
> >>
> >>
>

Reply via email to