+1
On Aug 22, 2007, at 11:23 AM, Doug Cutting wrote:
Thorsten Schuett wrote:
> In my case, it looks as if the loopback device is the bottleneck. So
> increasing the number of tasks won't help.
Hmm. I have trouble believing that the loopback device is actually
the
bottleneck. What makes you think that it is?
To better support standalone use of Hadoop on multicore boxes, perhaps
we should promote the MiniMR cluster code from test into the core.
This
runs the tasktracker and jobtracker in the same process. It still
forks
processes for tasks, and has all the features of a grid setup: web ui,
task restarting, etc.
I don't think we should spend much effort adding multi-threading to
LocalRunner, since it lacks so many of the other features of
TaskTracker/JobTracker. We should also avoid re-implementing those
features. Thus running TaskTracker and JobTracker in the same JVM
seems
like a good strategy for multicore support.
If performance with a MiniMR cluster is not good, then we should
determine why. We could, e.g., benchmark and profile sort performance
in this configuration. Again, I have a hard time believing that
loopback bandwidth is a bottleneck. If it is, then perhaps we can
optimize around it, but let's first be sure that's the case.
Note that, when running standalone, even with TaskTracker and
JobTracker, one need not use HDFS. Direct access to the local
filesystem will probably be considerably faster.
Doug