Or the other way around: I've been thinking of the Mahout jobs as Pig primitives.
On Tue, Mar 8, 2011 at 3:26 PM, Sean Owen <[email protected]> wrote: > Looks interesting -- it looks like a specialization for iterative > algorithms of a certain kind, a kind which describes a lot of > algorithms. Is this distributed? It looked more like it's intended for > high-performance machines. I guess it's also different being C++-based > and not Hadoop-based. > > Hadoop is, in the end, a tool that was never conceived for general > distributed computation. But among frameworks it's (relatively) well > understood and available. It seems like Mahout has taken on the > mission of delivering something that works on the framework that's out > there now, which is a practical rather than theoretically-motivated > goal. (I think it's a good goal too.) I see that as a difference from > many research-oriented projects. > > Beyond that it is the same sort of thing and that's good. > > The thing I "worry" most that is being duplicated is actually Pig. It > at least gives something more like "primitives" for basic > information-shuffling operations on Hadoop like the sorts of pivots > and joins and filters that go into your standard implementation of an > ML algorithm. I bet we'd find we'd be better off bringing in some > stuff from Pig rather than reinvent the join a few times over. > > But first things first... would really be good to focus on revamping > and bringing together what we have already to pull together > commonality and such before thinking what we can improve about those > commonalities. > > > On Tue, Mar 8, 2011 at 11:07 PM, Shannon Quinn <[email protected]> wrote: >> Being the newbie on the block, forgive me if I'm rehashing old news: has >> anything seen/heard of GraphLab before? >> >> http://www.graphlab.ml.cmu.edu/index.html >> >> It's written by someone who has an office in the same exact building as I >> do, just one floor up, so I'll certainly be talking to him soon. But if >> there is someone here who is familiar with this work, can you elaborate on >> the differences between it and Mahout? He seems to have somewhat tweaked the >> standard map/reduce paradigm into something that offers more crosstalk >> flexibility between nodes at runtime (at the cost of significant >> configurational overhead, most likely), but beyond that it seems strikingly >> similar to the functionality Mahout provides. >> >> Anyway, was pointed to this by someone in my department while I was running >> my coalescing thesis ideas by him. >> >> Shannon >> > -- Lance Norskog [email protected]
