Applicable for tiny clusters only.  There is no fault tolerance and all data
is streamed from map to reduce.  There is also no distributed store (they
are depending on NFS or local data copies).

That is highly effective for algorithms like k-means on small clusters which
are I/O bound.  Small clusters make failure tolerance much less important,
of course.  More recent versions of Hadoop do a much better job of avoiding
a spill to disk and are likely to show much better performance.

The other win for Twister is the preservation of the map-task across
multiple iterations.  This is clearly a big win for some algorithms (most
notably k-means).

I would say that overall, this is nice but definitely not ready for prime
time.

On Wed, Feb 10, 2010 at 10:10 AM, Robin Anil <robin.a...@gmail.com> wrote:

> Well, Things seems to be heating up. We better start refactoring :)
>
>

Reply via email to