[
http://issues.apache.org/jira/browse/HADOOP-707?page=comments#action_12451007 ]
Johan Oskarson commented on HADOOP-707:
---------------------------------------
The jobtracker ui shows the task as still running. The tasktracker web ui also
shows the task as running. Howerver, on that node there is only a datanode and
tasktracker running.
In the tasktracker log on that node:
2006-11-18 01:39:53,722 INFO org.apache.hadoop.mapred.TaskRunner:
task_0197_m_000137_0 06/11/18 01:39:53 WARN mapred.TaskRunner:
java.net.SocketTimeoutException: timed out waiting for rpc response
2006-11-18 01:39:53,722 INFO org.apache.hadoop.mapred.TaskRunner:
task_0197_m_000137_0 at org.apache.hadoop.ipc.Client.call(Client.java:460)
2006-11-18 01:39:53,722 INFO org.apache.hadoop.mapred.TaskRunner:
task_0197_m_000137_0 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164)
2006-11-18 01:39:53,722 INFO org.apache.hadoop.mapred.TaskRunner:
task_0197_m_000137_0 at org.apache.hadoop.mapred.$Proxy0.progress(Unknown
Source)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner:
task_0197_m_000137_0 at
org.apache.hadoop.mapred.Task.reportProgress(Task.java:173)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner:
task_0197_m_000137_0 at
org.apache.hadoop.mapred.Task.reportProgress(Task.java:162)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner:
task_0197_m_000137_0 at
org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:200)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner:
task_0197_m_000137_0 at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner:
task_0197_m_000137_0 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:215)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner:
task_0197_m_000137_0 at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1247)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner:
task_0197_m_000137_0
2006-11-18 01:39:54,166 INFO org.apache.hadoop.mapred.TaskTracker:
task_0197_m_000137_0 0.8654954%
/user/hadoop/data/submissions/1160000000/1163000000/1163730000/1163730000:30169435+6033887
2006-11-18 01:39:54,166 INFO org.apache.hadoop.mapred.TaskRunner:
task_0194_m_001875_0 done; removing files.
2006-11-18 01:39:54,894 INFO org.apache.hadoop.mapred.TaskTracker:
task_0197_m_000137_0 1.0%
/user/hadoop/data/submissions/1160000000/1163000000/1163730000/1163730000:30169435+6033887
2006-11-18 01:39:54,894 INFO org.apache.hadoop.mapred.TaskRunner:
task_0194_m_001888_0 done; removing files.
2006-11-18 01:39:55,268 INFO org.apache.hadoop.mapred.TaskTracker: Task
task_0197_m_000137_0 is done.
And a bit further down:
2006-11-18 01:39:56,257 WARN org.apache.hadoop.ipc.Server: handler output error
java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:125)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:294)
at
org.apache.hadoop.ipc.SocketChannelOutputStream.flushBuffer(SocketChannelOutputStream.java:108)
at
org.apache.hadoop.ipc.SocketChannelOutputStream.write(SocketChannelOutputStream.java:89)
at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:532)
> Final map task gets stuck
> -------------------------
>
> Key: HADOOP-707
> URL: http://issues.apache.org/jira/browse/HADOOP-707
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.8.0
> Environment: using latest trunk
> Reporter: Johan Oskarson
> Priority: Critical
>
> I've seen numerous jobs lately where the final map task gets stuck, never
> finishing.
> The jobtracker doesn't reassign the task. A restart of the tasktracker solves
> the issue and the job can finish.
> In the web interface it turns up as:
> task_0028_m_000534_0 node17.herd1 RUNNING 0.00% 10-Nov-2006 12:21:12
> 10-Nov-2006 12:22:19 (1mins, 6sec)
> Task failed to report status for 604 seconds. Killing.
> Only exception I find in that tasktracker log is this (a few times):
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:125)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:294)
> at
> org.apache.hadoop.ipc.SocketChannelOutputStream.flushBuffer(SocketChannelOutputStream.java:108)
> at
> org.apache.hadoop.ipc.SocketChannelOutputStream.write(SocketChannelOutputStream.java:89)
> at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at java.io.DataOutputStream.flush(DataOutputStream.java:106)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:532)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira