Re: other implementations of TaskRunner

Martin Jaggi Sun, 01 Jun 2008 05:13:03 -0700

That would indeed be a nice idea, that there could be otherimplementations of TaskRunner suited for special hardware, or for in-memory systems.

But if the communication remains the same (HDFS with disk access),this would not necessarily make things faster in the shuffling phaseetc.



Am 01.06.2008 um 10:26 schrieb Christophe Taton:

Actually Hadoop could be made more friendly to such realtime Map/Reduce
jobs.
For instance, we could consider running all tasks inside the tasktrackerjvm as separate threads, which could be implemented as anotherpersonality
of the TaskRunner.
I have been looking into this a couple of weeks ago...
Would you be interested in such a feature?

Christophe T.
On Sun, Jun 1, 2008 at 10:08 AM, Ted Dunning <[EMAIL PROTECTED]>wrote:
Hadoop is highly optimized towards handling datasets that are muchtoo largeto fit into memory. That means that there are many trade-offs thathavebeen made that make it much less useful for very short jobs or jobsthat
would fit into memory easily.
Multi-core implementations of map-reduce are very interesting for anumber
of applications as are in-memory implementations for distributed
architectures. I don't think that anybody really knows yet howwell theseother implementations will play with Hadoop. The regimes that theyaredesigned to optimize are very different in terms of data scale,number ofmachines and networking speed. All of these constraints drive thedesign
in innumerable ways.
On Sat, May 31, 2008 at 7:51 PM, Martin Jaggi <[EMAIL PROTECTED]>wrote:
Concerning real-time Map Reduce within (and not only between)machines
(multi-core & GPU), e.g. the Phoenix and Mars frameworks:
I'm really interested in very fast Map Reduce tasks, i.e. withoutmuch diskaccess. With the rise of multi-core systems, this could get moreand moreinteresting, and could maybe even lead to something like 'super-computingfor everyone', or is that a bit overwhelming? Anyway I was nicelysurprisedto see the recent Phoenix (http://csl.stanford.edu/~christos/sw/phoenix/)implementation of Map Reduce for multi-core CPUs (they won thebest paper
award at HPCA'07).
Recently also GPU computing was in the news again, pushed byNvidia (checkCUDA http://www.nvidia.com/object/cuda_showcase.html ), and nowalso
there a Map Reduce implementation called Mars became available:
http://www.cse.ust.hk/gpuqp/Mars_tr.pdf
The Mars people say a the end of their paper "We are alsointerested inintegrating Mars into the existing Map Reduce implementations suchas Hadoopso that the Map Reduce framework can take the advantage of theparallelismamong different machines as well as the parallelism within eachmachine."
What do you think of this, especially about the multi-coreapproach? Do youthink these needs are already served by the currentInMemoryFileSystem ofHadoop or not? Are there any plans of 'integrating' one of the twoabove
frameworks?
Or would it already be done by improving the significantintermediate data
pairs overhead (https://issues.apache.org/jira/browse/HADOOP-3366 )?

Any comments?

Re: other implementations of TaskRunner

Reply via email to