vikash kumar created HADOOP-10145: ------------------------------------- Summary: Reduce task stuck on 0.16666667% Key: HADOOP-10145 URL: https://issues.apache.org/jira/browse/HADOOP-10145 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 0.20.2 Environment: OS: RHEL 6.4 Hadoop version: 0.20.2-cdh3u6 Reporter: vikash kumar
All of sudden, one of the Hadoop jobs is stuck, basically the reduce takes forever to complete(we have waited for 30 hours, usually it takes an hour to complete). in tasktracker logs i see tons of following messages, however at times, resubmitting the same job works fine. 2013-12-04 00:00:00,381 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159167_r_000041_0 0.16666667% reduce > copy (1 of 2 at 0.01 MB/s) > 2013-12-04 00:00:00,750 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159167_r_000048_0 0.16666667% reduce > copy (1 of 2 at 0.01 MB/s) > 2013-12-04 00:00:01,729 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000046_0 0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) > 2013-12-04 00:00:01,918 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000055_0 0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) > 2013-12-04 00:00:01,919 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000021_0 0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) > 2013-12-04 00:00:01,922 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000031_0 0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) > 2013-12-04 00:00:01,940 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159262_r_000057_0 0.16666667% reduce > copy (1 of 2 at 0.03 MB/s) > 2013-12-04 00:00:02,443 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201310070546_159167_r_000047_0 0.16666667% reduce > copy (1 of 2 at 0.01 MB/s) > there are no other resonable clues in log for me to get a direction on, what am i looking for. with my setup, upgrading to new version is not an option. please help! -- This message was sent by Atlassian JIRA (v6.1#6144)