Running 0.13.1 - running into this very predictably (some tasks seem to keep timing out). The pattern is like this:
- tasktracker says reduce task is not responding: 2007-10-20 18:40:28,225 INFO org.apache.hadoop.mapred.TaskTracker: task_0006_r_000000_38 0.0% reduce > copy > 2007-10-20 18:50:36,772 INFO org.apache.hadoop.mapred.TaskTracker: task_0006_r_000000_38: Task failed to report status for 608 seconds. Killing. - but reduce task is chugging away: 2007-10-20 18:46:18,070 INFO org.apache.hadoop.mapred.ReduceTask: task_0006_r_000000_38 Copying task_0006_m_000003_0 output from hadoop037.sf2p.facebook.com. 2007-10-20 18:46:28,235 INFO org.apache.hadoop.mapred.ReduceTask: task_0006_r_000000_38 done copying task_0006_m_000007_0 output from hadoop021.sf2p.facebook.com. >From the timestamps - the reduce task seems working away happily when the tasktracker times it out? Is there a relevant patch I should apply? Help appreciated - this is wreaking havoc .. Thx, Joydeep
