PS one potential downside (or upside, perhaps) is that they are going
to introduce
zookeeper as a dependency into task management substrate. On the
bright side, that would solve
problem of namenode SPOF though (as far as i understand), among other things.

On Tue, Mar 8, 2011 at 6:04 PM, Dmitriy Lyubimov <[email protected]> wrote:
> On Tue, Mar 8, 2011 at 3:26 PM, Sean Owen <[email protected]> wrote:
>> Looks interesting -- it looks like a specialization for iterative
>
>> Hadoop is, in the end, a tool that was never conceived for general
>> distributed computation. But among frameworks it's (relatively) well
>> understood and available. It seems like Mahout has taken on the
>> mission of delivering something that works on the framework that's out
>> there now, which is a practical rather than theoretically-motivated
>> goal. (I think it's a good goal too.) I see that as a difference from
>> many research-oriented projects.
>>
>
> At the last HUG they rolled out plans (preliminary alpha ETA summer) where
> they separate task management substrate from application substrate. I.e. once
> you got your task allocation & data/rack affinity refactored as a
> standalone concern,
> you can run MR or even MPI or whatever distributed data flow your
> heart desires.
>
> That's IMO a good news for stuff like mahout-math, a lot of times
> matrix jobs require something
> that is currently emulated by map-only passes, or has to resort to
> reduction whereas all is though
> is sequential merge without sort component .
>
> So i think brighter days are ahead (for Mahout in particular).
>

Reply via email to