Re: Realtime Map Reduce = Supercomputing for the Masses?

Christophe Taton Sun, 01 Jun 2008 01:26:40 -0700

Actually Hadoop could be made more friendly to such realtime Map/Reduce
jobs.
For instance, we could consider running all tasks inside the task tracker
jvm as separate threads, which could be implemented as another personality
of the TaskRunner.
I have been looking into this a couple of weeks ago...
Would you be interested in such a feature?


Christophe T.


On Sun, Jun 1, 2008 at 10:08 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:

> Hadoop is highly optimized towards handling datasets that are much too
> large
> to fit into memory.  That means that there are many trade-offs that have
> been made that make it much less useful for very short jobs or jobs that
> would fit into memory easily.
>
> Multi-core implementations of map-reduce are very interesting for a number
> of applications as are in-memory implementations for distributed
> architectures.  I don't think that anybody really knows yet how well these
> other implementations will play with Hadoop.  The regimes that they are
> designed to optimize are very different in terms of data scale, number of
> machines and networking speed.  All of these constraints drive the design
> in
> innumerable ways.
>
> On Sat, May 31, 2008 at 7:51 PM, Martin Jaggi <[EMAIL PROTECTED]> wrote:
>
> > Concerning real-time Map Reduce within (and not only between) machines
> > (multi-core & GPU), e.g. the Phoenix and Mars frameworks:
> >
> > I'm really interested in very fast Map Reduce tasks, i.e. without much
> disk
> > access. With the rise of multi-core systems, this could get more and more
> > interesting, and could maybe even lead to something like 'super-computing
> > for everyone', or is that a bit overwhelming? Anyway I was nicely
> surprised
> > to see the recent Phoenix 
> > (http://csl.stanford.edu/~christos/sw/phoenix/<http://csl.stanford.edu/%7Echristos/sw/phoenix/>
> <http://csl.stanford.edu/%7Echristos/sw/phoenix/>)
> > implementation of Map Reduce for multi-core CPUs (they won the best paper
> > award at HPCA'07).
> >
> > Recently also GPU computing was in the news again, pushed by Nvidia
> (check
> > CUDA  http://www.nvidia.com/object/cuda_showcase.html ), and now also
> > there a Map Reduce implementation called Mars became available:
> > http://www.cse.ust.hk/gpuqp/Mars_tr.pdf
> > The Mars people say a the end of their paper "We are also interested in
> > integrating Mars into the existing Map Reduce implementations such as
> Hadoop
> > so that the Map Reduce framework can take the advantage of the
> parallelism
> > among different machines as well as the parallelism within each machine."
> >
> > What do you think of this, especially about the multi-core approach? Do
> you
> > think these needs are already served by the current InMemoryFileSystem of
> > Hadoop or not? Are there any plans of 'integrating' one of the two above
> > frameworks?
> > Or would it already be done by improving the significant intermediate
> data
> > pairs overhead (https://issues.apache.org/jira/browse/HADOOP-3366 )?
> >
> > Any comments?
> >
>

Re: Realtime Map Reduce = Supercomputing for the Masses?

Reply via email to