I think that feature makes sense because starting JVM has overhead.
On Sun, Jun 1, 2008 at 4:26 AM, Christophe Taton <[EMAIL PROTECTED]> wrote: > Actually Hadoop could be made more friendly to such realtime Map/Reduce > jobs. > For instance, we could consider running all tasks inside the task tracker > jvm as separate threads, which could be implemented as another personality > of the TaskRunner. > I have been looking into this a couple of weeks ago... > Would you be interested in such a feature? > > Christophe T. > > > On Sun, Jun 1, 2008 at 10:08 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > >> Hadoop is highly optimized towards handling datasets that are much too >> large >> to fit into memory. That means that there are many trade-offs that have >> been made that make it much less useful for very short jobs or jobs that >> would fit into memory easily. >> >> Multi-core implementations of map-reduce are very interesting for a number >> of applications as are in-memory implementations for distributed >> architectures. I don't think that anybody really knows yet how well these >> other implementations will play with Hadoop. The regimes that they are >> designed to optimize are very different in terms of data scale, number of >> machines and networking speed. All of these constraints drive the design >> in >> innumerable ways. >> >> On Sat, May 31, 2008 at 7:51 PM, Martin Jaggi <[EMAIL PROTECTED]> wrote: >> >> > Concerning real-time Map Reduce within (and not only between) machines >> > (multi-core & GPU), e.g. the Phoenix and Mars frameworks: >> > >> > I'm really interested in very fast Map Reduce tasks, i.e. without much >> disk >> > access. With the rise of multi-core systems, this could get more and more >> > interesting, and could maybe even lead to something like 'super-computing >> > for everyone', or is that a bit overwhelming? Anyway I was nicely >> surprised >> > to see the recent Phoenix >> > (http://csl.stanford.edu/~christos/sw/phoenix/<http://csl.stanford.edu/%7Echristos/sw/phoenix/> >> <http://csl.stanford.edu/%7Echristos/sw/phoenix/>) >> > implementation of Map Reduce for multi-core CPUs (they won the best paper >> > award at HPCA'07). >> > >> > Recently also GPU computing was in the news again, pushed by Nvidia >> (check >> > CUDA http://www.nvidia.com/object/cuda_showcase.html ), and now also >> > there a Map Reduce implementation called Mars became available: >> > http://www.cse.ust.hk/gpuqp/Mars_tr.pdf >> > The Mars people say a the end of their paper "We are also interested in >> > integrating Mars into the existing Map Reduce implementations such as >> Hadoop >> > so that the Map Reduce framework can take the advantage of the >> parallelism >> > among different machines as well as the parallelism within each machine." >> > >> > What do you think of this, especially about the multi-core approach? Do >> you >> > think these needs are already served by the current InMemoryFileSystem of >> > Hadoop or not? Are there any plans of 'integrating' one of the two above >> > frameworks? >> > Or would it already be done by improving the significant intermediate >> data >> > pairs overhead (https://issues.apache.org/jira/browse/HADOOP-3366 )? >> > >> > Any comments? >> > >> >
