I noticed when reading http://wiki.apache.org/hadoop/HardwareBenchmarks the following comment:
"I ran into some odd behavior on Herd2 where if i [ . . . ] the reducers don't start until the mappers finish, slowing the job significantly." This puzzled me. I don't see how reducers can ever start before the mappers have finished. I thought that any given call to a reducer will supply all the (key,value) pairs for a given value of the key. How can a reducer start until all the different values for a key are known? And thus how can a reducer start before all the mappers have finished? - Marc
