If we stay with Hadoop, then just nuking task failure tolerance and assuming
that all reducers are live would allow a very fast and simply push model for
memory resident algorithms.

On Fri, Sep 16, 2011 at 12:14 AM, Jake Mannix <[email protected]> wrote:

> The problem with raw Hadoop jobs which are iterative is that they launch
> multiple jobs, which can get executed on whatever machines the JobTracker
> sends them to, with open mapper slots.  An in-memory HDFS would still have
> files living at various locations, not necessarily the same as where all of
> the mappers go, which means the chunks need to get moved over to local disk
> of the mapper nodes.  Now if the entire HDFS-accessible-filesystem is on a
> memory-mapped filesystem, it would still go to memory, I guess, but this
> doesn't like a very efficient process: Hadoop is optimized for streaming
> over big files, and the map/reduce shuffle requires a lot of disk (in this
> case, memory!) to do what it does as well.
>

Reply via email to