Re: Realtime Map Reduce = Supercomputing for the Masses?

Ted Dunning Sun, 01 Jun 2008 01:09:16 -0700

Hadoop is highly optimized towards handling datasets that are much too large
to fit into memory.  That means that there are many trade-offs that have
been made that make it much less useful for very short jobs or jobs that
would fit into memory easily.


Multi-core implementations of map-reduce are very interesting for a number
of applications as are in-memory implementations for distributed
architectures.  I don't think that anybody really knows yet how well these
other implementations will play with Hadoop.  The regimes that they are
designed to optimize are very different in terms of data scale, number of
machines and networking speed.  All of these constraints drive the design in
innumerable ways.

On Sat, May 31, 2008 at 7:51 PM, Martin Jaggi <[EMAIL PROTECTED]> wrote:

> Concerning real-time Map Reduce within (and not only between) machines
> (multi-core & GPU), e.g. the Phoenix and Mars frameworks:
>
> I'm really interested in very fast Map Reduce tasks, i.e. without much disk
> access. With the rise of multi-core systems, this could get more and more
> interesting, and could maybe even lead to something like 'super-computing
> for everyone', or is that a bit overwhelming? Anyway I was nicely surprised
> to see the recent Phoenix 
> (http://csl.stanford.edu/~christos/sw/phoenix/<http://csl.stanford.edu/%7Echristos/sw/phoenix/>)
> implementation of Map Reduce for multi-core CPUs (they won the best paper
> award at HPCA'07).
>
> Recently also GPU computing was in the news again, pushed by Nvidia (check
> CUDA  http://www.nvidia.com/object/cuda_showcase.html ), and now also
> there a Map Reduce implementation called Mars became available:
> http://www.cse.ust.hk/gpuqp/Mars_tr.pdf
> The Mars people say a the end of their paper "We are also interested in
> integrating Mars into the existing Map Reduce implementations such as Hadoop
> so that the Map Reduce framework can take the advantage of the parallelism
> among different machines as well as the parallelism within each machine."
>
> What do you think of this, especially about the multi-core approach? Do you
> think these needs are already served by the current InMemoryFileSystem of
> Hadoop or not? Are there any plans of 'integrating' one of the two above
> frameworks?
> Or would it already be done by improving the significant intermediate data
> pairs overhead (https://issues.apache.org/jira/browse/HADOOP-3366 )?
>
> Any comments?
>



-- 
ted

Re: Realtime Map Reduce = Supercomputing for the Masses?

Reply via email to