On Mon, Mar 11, 2013 at 4:59 PM, Ted Dunning <[email protected]> wrote:
> Yarn by itself won't fix this problem. Yarn + Spark would fix it. But, > then again, so would Mesos + Spark or AmigaOS + Spark. > > Should we open several additional modules which are parallel to core but > which depend on alternatives like Giraph or Spark instead? That would > allow experimentation to proceed. > +1 to this. Also: a pig module, for folks who like to work with pig (as much as its still regular MR, it's an abstraction layer lots of people work with). > > On Mon, Mar 11, 2013 at 1:43 PM, Dmitriy Lyubimov <[email protected]> > wrote: > > > On Mon, Mar 11, 2013 at 1:24 PM, Sebastian Schelter <[email protected]> > > wrote: > > > > > Ideally, as implementor of a machine learning library wouldn't want to > > > think about how to most efficiently execute joins. It's data dependent > > > anyway in most cases. You would want to have an optimizer similar to > the > > > ones used in databases that takes your map reduce data flow and figures > > > out the best way to execute it. > > > > > > > And that's exactly the case which i was referring to as MR being "too low > > level api". > > > > That's why i turned to spark, at least in a cautious investigative way, > > because of the promise to provide higher level API (flume-like) and being > > cached in memory (restart/excessive I/O in pipelines) and combining with > > Bagel primitives on the same intermediate dataset (which, as far as i > > understand, is exactly what Ted said, sort-less redistribution to > buckets). > > It is so much richer. > > > > I understand that in the space of Mahout, we probably will have to wait > the > > promise of hybrid apis in Yarn etc. hadoop native stuff, but isn't really > > what would solve iterative structured and interconnected stuff? > > > > > > > > > > On 11.03.2013 21:16, Ted Dunning wrote: > > > > Kinda sorta.. > > > > > > > > You can defeat most of the sort if you want to just hash things to > > > buckets. > > > > > > > > On Mon, Mar 11, 2013 at 12:01 PM, Dmitriy Lyubimov < > [email protected] > > > >wrote: > > > > > > > >> Sort component adds log to > > > >> the asymptotic complexity, whereas it is clear that any streaming > > merge > > > >> algorithm just wouldn't need to do sort and capitalize on the > > structure > > > we > > > >> already know . (sure, you can do it map-side with a specific > streaming > > > join > > > >> logic but that would not be pure MR but rather some map task > > > acrobatics). > > > >> > > > > > > > > > > > > > -- -jake
