Actually Hadoop could be made more friendly to such realtime Map/Reduce jobs. For instance, we could consider running all tasks inside the task tracker jvm as separate threads, which could be implemented as another personality of the TaskRunner. I have been looking into this a couple of weeks ago... Would you be interested in such a feature?
Christophe T. On Sun, Jun 1, 2008 at 10:08 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Hadoop is highly optimized towards handling datasets that are much too > large > to fit into memory. That means that there are many trade-offs that have > been made that make it much less useful for very short jobs or jobs that > would fit into memory easily. > > Multi-core implementations of map-reduce are very interesting for a number > of applications as are in-memory implementations for distributed > architectures. I don't think that anybody really knows yet how well these > other implementations will play with Hadoop. The regimes that they are > designed to optimize are very different in terms of data scale, number of > machines and networking speed. All of these constraints drive the design > in > innumerable ways. > > On Sat, May 31, 2008 at 7:51 PM, Martin Jaggi <[EMAIL PROTECTED]> wrote: > > > Concerning real-time Map Reduce within (and not only between) machines > > (multi-core & GPU), e.g. the Phoenix and Mars frameworks: > > > > I'm really interested in very fast Map Reduce tasks, i.e. without much > disk > > access. With the rise of multi-core systems, this could get more and more > > interesting, and could maybe even lead to something like 'super-computing > > for everyone', or is that a bit overwhelming? Anyway I was nicely > surprised > > to see the recent Phoenix > > (http://csl.stanford.edu/~christos/sw/phoenix/<http://csl.stanford.edu/%7Echristos/sw/phoenix/> > <http://csl.stanford.edu/%7Echristos/sw/phoenix/>) > > implementation of Map Reduce for multi-core CPUs (they won the best paper > > award at HPCA'07). > > > > Recently also GPU computing was in the news again, pushed by Nvidia > (check > > CUDA http://www.nvidia.com/object/cuda_showcase.html ), and now also > > there a Map Reduce implementation called Mars became available: > > http://www.cse.ust.hk/gpuqp/Mars_tr.pdf > > The Mars people say a the end of their paper "We are also interested in > > integrating Mars into the existing Map Reduce implementations such as > Hadoop > > so that the Map Reduce framework can take the advantage of the > parallelism > > among different machines as well as the parallelism within each machine." > > > > What do you think of this, especially about the multi-core approach? Do > you > > think these needs are already served by the current InMemoryFileSystem of > > Hadoop or not? Are there any plans of 'integrating' one of the two above > > frameworks? > > Or would it already be done by improving the significant intermediate > data > > pairs overhead (https://issues.apache.org/jira/browse/HADOOP-3366 )? > > > > Any comments? > > >
