I'm trying to run a TeraSort job to confirm that my cluster is set up
correctly. The mappers perform fine, but in the reduce stage all the data
is sent to a single node. My mapred.reduce.tasks parameter is set to an
appropriate value greater than 1. I am launching multiple reducers, but
only one of them is receiving input.

It looks like the TeraSort partition function is buggy, but there's no way
that it would have a bug this obvious. I've looked for configuration errors
on my part and found none. So now I'm asking if anyone else has seen this
problem and can explain it.

In the archives from February 27 of this year David Salle's post "TeraSort
bug?<http://grokbase.com/t/hadoop.apache.org/common-user/2011/02/terasort-bug/27pzea46iowbfkbd4l5y566i4iv4>"
describes what appears to be the same problem, but the only response I see
is from David Salle the next day, apologizing and saying to ignore his
previous post. Presumably he found some mistake on his end that he thought
was trivial, but it doesn't look trivial to me.

Reply via email to