Hi, first of all, thanks for Hadoop. It's amazing how much you can get done with a small hadoop job.
My setup is a little bit different from the usual. I have a mid-sized Opteron machine with the data resting on a local raid. I configured LocalFileSystem and 2 map + 2 reduce tasks per core. During the reduce phase I see rather slow copy values in the webinterface and <50% cpu usage in total. vmstat shows that hadoop constantly reads ~10-20MB/s and writes in short bursts with higher speeds (>100MB/s). Neither the disks nor the cpus seem to be the bottleneck. What's interesting though, is the traffic on the loopback device. There is constant traffic in the same order as the read rate mentioned above. Please correct me if I am wrong, but it looks like hadoop is using the rpc mechanism to copy the map output files to the reduce task (in this case via the loopback device). If my assumptions are correct, would it be possible to read/access the files directly in the "one-node mode"? Thanks, Thorsten
