See Giraph.

On Thu, Mar 7, 2013 at 6:01 PM, Andy Twigg <[email protected]> wrote:

> That sounds like a horrid amount of work to do something simple. Is there a
> hadoop implementation of a master-workers problem you can point me to?
> On Mar 7, 2013 9:57 PM, "Ted Dunning" <[email protected]> wrote:
>
> > On Thu, Mar 7, 2013 at 6:25 AM, Andy Twigg <[email protected]> wrote:
> >
> > > ... Right now what we have is a
> > > single-machine procedure for scanning through some data, building a
> > > set of histograms, combining histograms and then expanding the tree.
> > > The next step is to decide the best way to distribute this. I'm not an
> > > expert here, so any advice or help here is welcome.
> > >
> >
> > That sounds good so far.
> >
> >
> > > I think the easiest approach would be to use the mappers to construct
> > > the set of histograms, and then send all histograms for a given leaf
> > > to a reducer, which decides how to expand that leaf. The code I have
> > > can be almost be ported as-is to a mapper and reducer in this way.
> > > Would using the distributed cache to send the updated tree be wise, or
> > > is there a better way?
> > >
> >
> > Distributed cache is a very limited thing.  You can only put things in at
> > program launch and they must remain constant throughout the program's
> run.
> >
> > The problem here is that iterated map-reduce is pretty heinously
> > inefficient.
> >
> > The best candidate approaches for avoiding that are to use a BSP sort of
> > model (see the Pregel paper at
> > http://kowshik.github.com/JPregel/pregel_paper.pdf ) or use an
> > unsynchronized model update cycle the way that Vowpal Wabbit does with
> > all-reduce or the way that Google's deep learning system does.
> >
> > Running these approaches on Hadoop without Yarn or Mesos requires a
> slight
> > perversion of the map-reduce paradigm, but is quite doable.
> >
>

Reply via email to