PS one potential downside (or upside, perhaps) is that they are going to introduce zookeeper as a dependency into task management substrate. On the bright side, that would solve problem of namenode SPOF though (as far as i understand), among other things.
On Tue, Mar 8, 2011 at 6:04 PM, Dmitriy Lyubimov <[email protected]> wrote: > On Tue, Mar 8, 2011 at 3:26 PM, Sean Owen <[email protected]> wrote: >> Looks interesting -- it looks like a specialization for iterative > >> Hadoop is, in the end, a tool that was never conceived for general >> distributed computation. But among frameworks it's (relatively) well >> understood and available. It seems like Mahout has taken on the >> mission of delivering something that works on the framework that's out >> there now, which is a practical rather than theoretically-motivated >> goal. (I think it's a good goal too.) I see that as a difference from >> many research-oriented projects. >> > > At the last HUG they rolled out plans (preliminary alpha ETA summer) where > they separate task management substrate from application substrate. I.e. once > you got your task allocation & data/rack affinity refactored as a > standalone concern, > you can run MR or even MPI or whatever distributed data flow your > heart desires. > > That's IMO a good news for stuff like mahout-math, a lot of times > matrix jobs require something > that is currently emulated by map-only passes, or has to resort to > reduction whereas all is though > is sequential merge without sort component . > > So i think brighter days are ahead (for Mahout in particular). >
