Re: Reduce Performance

Doug Cutting Wed, 22 Aug 2007 11:24:19 -0700

Thorsten Schuett wrote:

In my case, it looks as if the loopback device is the bottleneck. So
increasing the number of tasks won't help.

Hmm. I have trouble believing that the loopback device is actually thebottleneck. What makes you think that it is?

To better support standalone use of Hadoop on multicore boxes, perhapswe should promote the MiniMR cluster code from test into the core. Thisruns the tasktracker and jobtracker in the same process. It still forksprocesses for tasks, and has all the features of a grid setup: web ui,task restarting, etc.

I don't think we should spend much effort adding multi-threading toLocalRunner, since it lacks so many of the other features ofTaskTracker/JobTracker. We should also avoid re-implementing thosefeatures. Thus running TaskTracker and JobTracker in the same JVM seemslike a good strategy for multicore support.

If performance with a MiniMR cluster is not good, then we shoulddetermine why. We could, e.g., benchmark and profile sort performancein this configuration. Again, I have a hard time believing thatloopback bandwidth is a bottleneck. If it is, then perhaps we canoptimize around it, but let's first be sure that's the case.

Note that, when running standalone, even with TaskTracker andJobTracker, one need not use HDFS. Direct access to the localfilesystem will probably be considerably faster.


Doug

Re: Reduce Performance

Reply via email to