[ http://issues.apache.org/jira/browse/HADOOP-707?page=comments#action_12451007 ] Johan Oskarson commented on HADOOP-707: ---------------------------------------
The jobtracker ui shows the task as still running. The tasktracker web ui also shows the task as running. Howerver, on that node there is only a datanode and tasktracker running. In the tasktracker log on that node: 2006-11-18 01:39:53,722 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0 06/11/18 01:39:53 WARN mapred.TaskRunner: java.net.SocketTimeoutException: timed out waiting for rpc response 2006-11-18 01:39:53,722 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0 at org.apache.hadoop.ipc.Client.call(Client.java:460) 2006-11-18 01:39:53,722 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164) 2006-11-18 01:39:53,722 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0 at org.apache.hadoop.mapred.$Proxy0.progress(Unknown Source) 2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0 at org.apache.hadoop.mapred.Task.reportProgress(Task.java:173) 2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0 at org.apache.hadoop.mapred.Task.reportProgress(Task.java:162) 2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0 at org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:200) 2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46) 2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:215) 2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0 at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1247) 2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: task_0197_m_000137_0 2006-11-18 01:39:54,166 INFO org.apache.hadoop.mapred.TaskTracker: task_0197_m_000137_0 0.8654954% /user/hadoop/data/submissions/1160000000/1163000000/1163730000/1163730000:30169435+6033887 2006-11-18 01:39:54,166 INFO org.apache.hadoop.mapred.TaskRunner: task_0194_m_001875_0 done; removing files. 2006-11-18 01:39:54,894 INFO org.apache.hadoop.mapred.TaskTracker: task_0197_m_000137_0 1.0% /user/hadoop/data/submissions/1160000000/1163000000/1163730000/1163730000:30169435+6033887 2006-11-18 01:39:54,894 INFO org.apache.hadoop.mapred.TaskRunner: task_0194_m_001888_0 done; removing files. 2006-11-18 01:39:55,268 INFO org.apache.hadoop.mapred.TaskTracker: Task task_0197_m_000137_0 is done. And a bit further down: 2006-11-18 01:39:56,257 WARN org.apache.hadoop.ipc.Server: handler output error java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:125) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:294) at org.apache.hadoop.ipc.SocketChannelOutputStream.flushBuffer(SocketChannelOutputStream.java:108) at org.apache.hadoop.ipc.SocketChannelOutputStream.write(SocketChannelOutputStream.java:89) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:532) > Final map task gets stuck > ------------------------- > > Key: HADOOP-707 > URL: http://issues.apache.org/jira/browse/HADOOP-707 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.8.0 > Environment: using latest trunk > Reporter: Johan Oskarson > Priority: Critical > > I've seen numerous jobs lately where the final map task gets stuck, never > finishing. > The jobtracker doesn't reassign the task. A restart of the tasktracker solves > the issue and the job can finish. > In the web interface it turns up as: > task_0028_m_000534_0 node17.herd1 RUNNING 0.00% 10-Nov-2006 12:21:12 > 10-Nov-2006 12:22:19 (1mins, 6sec) > Task failed to report status for 604 seconds. Killing. > Only exception I find in that tasktracker log is this (a few times): > java.nio.channels.ClosedChannelException > at > sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:125) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:294) > at > org.apache.hadoop.ipc.SocketChannelOutputStream.flushBuffer(SocketChannelOutputStream.java:108) > at > org.apache.hadoop.ipc.SocketChannelOutputStream.write(SocketChannelOutputStream.java:89) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at java.io.DataOutputStream.flush(DataOutputStream.java:106) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:532) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira