Reduce tips complete 100%, but job does not complete saying reduces still
running.
----------------------------------------------------------------------------------
Key: HADOOP-2167
URL: https://issues.apache.org/jira/browse/HADOOP-2167
Project: Hadoop
Issue Type: Bug
Reporter: Amareshwari Sri Ramadasu
Assignee: Arun C Murthy
Priority: Critical
Job's reduces are stuck at 99.43% progress and 2 reduces in running state and
Job is not complete.
But the reduce task list on the job tracker shows they are complete 100% and
marked as SUCCEEDED and Finishtime is available jobtasks.jsp and jobhistory
also.
With ipc.client.timeout = 600000, the exceptions on TT's running the reduces are
On one of the TTs, the logs show the following:
2007-11-07 08:34:16,092 INFO org.apache.hadoop.mapred.TaskTracker: Task
task_200711070637_0001_r_000150_0 is done.
2007-11-07 08:35:34,013 INFO org.apache.hadoop.mapred.TaskTracker: Task
task_200711070637_0001_r_000156_0 is done.
2007-11-07 08:42:44,751 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
exception: java.net.SocketTimeoutException: timedout waiting for rpc response
at org.apache.hadoop.ipc.Client.call(Client.java:484)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
at org.apache.hadoop.mapred.$Proxy0.heartbeat(Unknown Source)
at
org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:897)
at
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:799)
at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1193)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2055)
2007-11-07 08:42:44,767 INFO org.apache.hadoop.mapred.TaskTracker: Resending
'status' to .................
On the other TT,
2007-11-07 08:40:30,484 INFO org.apache.hadoop.mapred.TaskTracker: Task
task_200711070637_0001_r_000160_0 is done.
2007-11-07 08:42:45,508 ERROR org.apache.hadoop.mapred.TaskTracker: Caught
exception: java.net.SocketTimeoutException: timedout waiting for rpc response
at org.apache.hadoop.ipc.Client.call(Client.java:484)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
at org.apache.hadoop.mapred.$Proxy0.heartbeat(Unknown Source)
at
org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:897)
at
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:799)
at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1193)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2055)
2007-11-07 08:42:45,508 INFO org.apache.hadoop.mapred.TaskTracker: Resending
'status' to ..........
On JT logs, the reduce tasks are done successfully:
2007-11-07 06:39:09,151 INFO org.apache.hadoop.mapred.JobTracker: Adding task
'task_200711070637_0001_r_000160_0' to tip tip_200711070637_0001_r_000160, for
tracker 'x'
2007-11-07 08:42:45,708 INFO org.apache.hadoop.mapred.TaskRunner: Saved output
of task 'task_200711070637_0001_r_000160_0' to 'y'
2007-11-07 08:42:45,708 INFO org.apache.hadoop.mapred.JobInProgress: Task
'task_200711070637_0001_r_000160_0' has completed
tip_200711070637_0001_r_000160 successfully.
This would suggest that if tasks are done before the timeout, the problem
occurs in progress update. This is also not consistent since other reduce tasks
in the same situation are successful.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.