I have hadoop jobs with the last 1 reducer randomly hangs on getting 0 mapper
output. By randomly I mean the job sometimes works correctly, sometimes their
last 1 reducer keeps reading map output but always gets 0 data. It would hang
up to 100 hours for getting 0 data until I kill it. After I kill and re-run it,
it could run correctly. The hung reducer could happen on any machine of my
cluster.
I attach the tail of the problematic reducer's log here. Does anybody have a
hint what happened?
syslog logs
2009-04-09 21:57:46,445 INFO org.apache.hadoop.mapred.ReduceTask:
task_200902022141_50382_r_000008_0 Need 15 map output(s)
2009-04-09 21:57:46,446 INFO org.apache.hadoop.mapred.ReduceTask:
task_200902022141_50382_r_000008_0: Got 0 new map-outputs & 0 obsolete
map-outputs from tasktracker and 0 map-outputs from previous failures
2009-04-09 21:57:46,446 INFO org.apache.hadoop.mapred.ReduceTask:
task_200902022141_50382_r_000008_0 Got 0 known map output location(s);
scheduling...
2009-04-09 21:57:46,446 INFO org.apache.hadoop.mapred.ReduceTask:
task_200902022141_50382_r_000008_0 Scheduled 0 of 0 known outputs (0 slow hosts
and 0 dup hosts)
2009-04-09 21:57:51,453 INFO org.apache.hadoop.mapred.ReduceTask:
task_200902022141_50382_r_000008_0 Need 15 map output(s)
2009-04-09 21:57:51,460 INFO org.apache.hadoop.mapred.ReduceTask:
task_200902022141_50382_r_000008_0: Got 0 new map-outputs & 0 obsolete
map-outputs from tasktracker and 0 map-outputs from previous failures
2009-04-09 21:57:51,460 INFO org.apache.hadoop.mapred.ReduceTask:
task_200902022141_50382_r_000008_0 Got 0 known map output location(s);
scheduling...
2009-04-09 21:57:51,460 INFO org.apache.hadoop.mapred.ReduceTask:
task_200902022141_50382_r_000008_0 Scheduled 0 of 0 known outputs (0 slow hosts
and 0 dup hosts)
... (forever)