On Tue, Mar 8, 2011 at 3:26 PM, Sean Owen <[email protected]> wrote: > Looks interesting -- it looks like a specialization for iterative
> Hadoop is, in the end, a tool that was never conceived for general > distributed computation. But among frameworks it's (relatively) well > understood and available. It seems like Mahout has taken on the > mission of delivering something that works on the framework that's out > there now, which is a practical rather than theoretically-motivated > goal. (I think it's a good goal too.) I see that as a difference from > many research-oriented projects. > At the last HUG they rolled out plans (preliminary alpha ETA summer) where they separate task management substrate from application substrate. I.e. once you got your task allocation & data/rack affinity refactored as a standalone concern, you can run MR or even MPI or whatever distributed data flow your heart desires. That's IMO a good news for stuff like mahout-math, a lot of times matrix jobs require something that is currently emulated by map-only passes, or has to resort to reduction whereas all is though is sequential merge without sort component . So i think brighter days are ahead (for Mahout in particular).
