Hi When I run the wordcount example, I get nearly 100% CPU utilization for the Map phase, but the Reduce phase takes forever, never breaking more than 1-2% utilization. Looking at the code, the Reduce isn't very complicated, so I'm not sure why it's so slow.
Here is my configuration Hadoop 0.12.3 13 tasktracker and datanodes (6 are slow and 7 are fast) (I get this behavior with any number of task/data nodes) 1 jobtracker and namenode 2459 input files, total 0f 28MB (a bunch of C code from the Linux kernel) 37 map tasks 1 reduce task I've seen some other posts to this list about similar problems with wordcount, but they didn't seem quite right. Any ideas why Map would be fast and Reduce would be so slow? Thanks! -steve
