Re: Discossuon Of ML environment/MR, Mahout

Jake Mannix Mon, 11 Mar 2013 17:47:03 -0700

On Mon, Mar 11, 2013 at 4:59 PM, Ted Dunning <[email protected]> wrote:


> Yarn by itself won't fix this problem.  Yarn + Spark would fix it.  But,
> then again, so would Mesos + Spark or AmigaOS + Spark.
>
> Should we open several additional modules which are parallel to core but
> which depend on alternatives like Giraph or Spark instead?  That would
> allow experimentation to proceed.
>

+1 to this.

Also: a pig module, for folks who like to work with pig (as much
as its still regular MR, it's an abstraction layer lots of people work
with).


>
> On Mon, Mar 11, 2013 at 1:43 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
>
> > On Mon, Mar 11, 2013 at 1:24 PM, Sebastian Schelter <[email protected]>
> > wrote:
> >
> > > Ideally, as implementor of a machine learning library wouldn't want to
> > > think about how to most efficiently execute joins. It's data dependent
> > > anyway in most cases. You would want to have an optimizer similar to
> the
> > > ones used in databases that takes your map reduce data flow and figures
> > > out the best way to execute it.
> > >
> >
> > And that's exactly the case which i was referring to as MR being "too low
> > level api".
> >
> > That's why i turned to spark, at least in a cautious investigative way,
> > because of the promise to provide higher level API (flume-like) and being
> > cached in memory (restart/excessive I/O in pipelines) and combining with
> > Bagel primitives on the same intermediate dataset (which, as far as i
> > understand, is exactly what Ted said, sort-less redistribution to
> buckets).
> > It is so much richer.
> >
> > I understand that in the space of Mahout, we probably will have to wait
> the
> > promise of hybrid apis in Yarn etc. hadoop native stuff, but isn't really
> > what would solve iterative structured and interconnected stuff?
> >
> >
> > >
> > > On 11.03.2013 21:16, Ted Dunning wrote:
> > > > Kinda sorta..
> > > >
> > > > You can defeat most of the sort if you want to just hash things to
> > > buckets.
> > > >
> > > > On Mon, Mar 11, 2013 at 12:01 PM, Dmitriy Lyubimov <
> [email protected]
> > > >wrote:
> > > >
> > > >> Sort component adds log to
> > > >> the asymptotic complexity, whereas it is clear that any streaming
> > merge
> > > >> algorithm just wouldn't need to do sort and capitalize on the
> > structure
> > > we
> > > >> already know . (sure, you can do it map-side with a specific
> streaming
> > > join
> > > >> logic but that would not be pure MR but rather some map task
> > > acrobatics).
> > > >>
> > > >
> > >
> > >
> >
>



-- 

  -jake

Re: Discossuon Of ML environment/MR, Mahout

Reply via email to