If we stay with Hadoop, then just nuking task failure tolerance and assuming that all reducers are live would allow a very fast and simply push model for memory resident algorithms.
On Fri, Sep 16, 2011 at 12:14 AM, Jake Mannix <[email protected]> wrote: > The problem with raw Hadoop jobs which are iterative is that they launch > multiple jobs, which can get executed on whatever machines the JobTracker > sends them to, with open mapper slots. An in-memory HDFS would still have > files living at various locations, not necessarily the same as where all of > the mappers go, which means the chunks need to get moved over to local disk > of the mapper nodes. Now if the entire HDFS-accessible-filesystem is on a > memory-mapped filesystem, it would still go to memory, I guess, but this > doesn't like a very efficient process: Hadoop is optimized for streaming > over big files, and the map/reduce shuffle requires a lot of disk (in this > case, memory!) to do what it does as well. >
