[ 
http://issues.apache.org/jira/browse/HADOOP-707?page=comments#action_12451007 ] 
            
Johan Oskarson commented on HADOOP-707:
---------------------------------------

The jobtracker ui shows the task as still running. The tasktracker web ui also 
shows the task as running. Howerver, on that node there is only a datanode and 
tasktracker running.

In the tasktracker log on that node:

2006-11-18 01:39:53,722 INFO org.apache.hadoop.mapred.TaskRunner: 
task_0197_m_000137_0 06/11/18 01:39:53 WARN mapred.TaskRunner: 
java.net.SocketTimeoutException: timed out waiting for rpc response
2006-11-18 01:39:53,722 INFO org.apache.hadoop.mapred.TaskRunner: 
task_0197_m_000137_0  at org.apache.hadoop.ipc.Client.call(Client.java:460)
2006-11-18 01:39:53,722 INFO org.apache.hadoop.mapred.TaskRunner: 
task_0197_m_000137_0  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164)
2006-11-18 01:39:53,722 INFO org.apache.hadoop.mapred.TaskRunner: 
task_0197_m_000137_0  at org.apache.hadoop.mapred.$Proxy0.progress(Unknown 
Source)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: 
task_0197_m_000137_0  at 
org.apache.hadoop.mapred.Task.reportProgress(Task.java:173)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: 
task_0197_m_000137_0  at 
org.apache.hadoop.mapred.Task.reportProgress(Task.java:162)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: 
task_0197_m_000137_0  at 
org.apache.hadoop.mapred.MapTask$3.next(MapTask.java:200)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: 
task_0197_m_000137_0  at 
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: 
task_0197_m_000137_0  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:215)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: 
task_0197_m_000137_0  at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1247)
2006-11-18 01:39:53,723 INFO org.apache.hadoop.mapred.TaskRunner: 
task_0197_m_000137_0
2006-11-18 01:39:54,166 INFO org.apache.hadoop.mapred.TaskTracker: 
task_0197_m_000137_0 0.8654954% 
/user/hadoop/data/submissions/1160000000/1163000000/1163730000/1163730000:30169435+6033887
2006-11-18 01:39:54,166 INFO org.apache.hadoop.mapred.TaskRunner: 
task_0194_m_001875_0 done; removing files.
2006-11-18 01:39:54,894 INFO org.apache.hadoop.mapred.TaskTracker: 
task_0197_m_000137_0 1.0% 
/user/hadoop/data/submissions/1160000000/1163000000/1163730000/1163730000:30169435+6033887
2006-11-18 01:39:54,894 INFO org.apache.hadoop.mapred.TaskRunner: 
task_0194_m_001888_0 done; removing files.
2006-11-18 01:39:55,268 INFO org.apache.hadoop.mapred.TaskTracker: Task 
task_0197_m_000137_0 is done.

And a bit further down:

2006-11-18 01:39:56,257 WARN org.apache.hadoop.ipc.Server: handler output error
java.nio.channels.ClosedChannelException
        at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:125)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:294)
        at 
org.apache.hadoop.ipc.SocketChannelOutputStream.flushBuffer(SocketChannelOutputStream.java:108)
        at 
org.apache.hadoop.ipc.SocketChannelOutputStream.write(SocketChannelOutputStream.java:89)
        at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
        at java.io.DataOutputStream.flush(DataOutputStream.java:106)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:532)

> Final map task gets stuck
> -------------------------
>
>                 Key: HADOOP-707
>                 URL: http://issues.apache.org/jira/browse/HADOOP-707
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.8.0
>         Environment: using latest trunk
>            Reporter: Johan Oskarson
>            Priority: Critical
>
> I've seen numerous jobs lately where the final map task gets stuck, never 
> finishing.
> The jobtracker doesn't reassign the task. A restart of the tasktracker solves 
> the issue and the job can finish.
> In the web interface it turns up as:
> task_0028_m_000534_0 node17.herd1 RUNNING 0.00%    10-Nov-2006 12:21:12 
> 10-Nov-2006 12:22:19 (1mins, 6sec)
> Task failed to report status for 604 seconds. Killing.
> Only exception I find in that tasktracker log is this (a few times):
> java.nio.channels.ClosedChannelException
>         at 
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:125)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:294)
>         at 
> org.apache.hadoop.ipc.SocketChannelOutputStream.flushBuffer(SocketChannelOutputStream.java:108)
>         at 
> org.apache.hadoop.ipc.SocketChannelOutputStream.write(SocketChannelOutputStream.java:89)
>         at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
>         at java.io.DataOutputStream.flush(DataOutputStream.java:106)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:532)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to