Reduce Performance

Thorsten Schuett Sat, 18 Aug 2007 04:18:31 -0700

Hi,

first of all, thanks for Hadoop. It's amazing how much you can get done with
a small hadoop job.


My setup is a little bit different from the usual. I have a mid-sized
Opteron machine with the data resting on a local raid. I configured
LocalFileSystem and 2 map + 2 reduce tasks per core.

During the reduce phase I see rather slow copy values in the webinterface
and <50% cpu usage in total. vmstat shows that hadoop  constantly reads
~10-20MB/s and writes in short bursts with higher speeds (>100MB/s).
Neither the disks nor the cpus seem to be the bottleneck.

What's interesting though, is the traffic on the loopback device. There is
constant traffic in the same order as the read rate mentioned above. Please
correct me if I am wrong, but it looks like hadoop is using the rpc
mechanism to copy the map output files to the reduce task (in this case via
the loopback device). If my assumptions are correct, would it be possible to
read/access the files directly in the "one-node mode"?

Thanks,
  Thorsten

Reduce Performance

Reply via email to