[jira] [Commented] (HADOOP-5380) Calls from running tasks to TaskTracker methods sometimes fail and incur a 60s timeout

Liyin Liang (JIRA) Thu, 21 Apr 2011 06:44:48 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-5380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022744#comment-13022744
 ]


Liyin Liang commented on HADOOP-5380:
-------------------------------------

We met the same problem with hadoop 0.19.1 when run terasort. Here is the task 
log:
2011-04-19 01:10:33,715 INFO org.apache.hadoop.mapred.TaskRunner: Task 
attempt_201104190103_0002_r_000442_0 is allowed to commit now
2011-04-19 01:10:33,723 INFO org.apache.hadoop.mapred.FileOutputCommitter: 
Saved output of task 'attempt_201104190103_0002_r_000442_0' to 
hdfs://r05b02043.yh.aliyun.com:9000/group/dc/willwu/terasort-out
2011-04-19 01:11:33,725 INFO org.apache.hadoop.mapred.TaskRunner: Communication 
exception: java.io.IOException: Call to /127.0.0.1:48950 failed on local 
exception: java.nio.channels.ClosedByInterruptException
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:736)
        at org.apache.hadoop.ipc.Client.call(Client.java:704)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at org.apache.hadoop.mapred.$Proxy0.statusUpdate(Unknown Source)
        at org.apache.hadoop.mapred.Task$1.run(Task.java:417)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.nio.channels.ClosedByInterruptException
        at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
        at 
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140)
        at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
        at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
        at java.io.DataOutputStream.flush(DataOutputStream.java:106)
        at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:473)
        at org.apache.hadoop.ipc.Client.call(Client.java:691)
        ... 4 more

2011-04-19 01:11:33,728 INFO org.apache.hadoop.mapred.TaskRunner: Task 
'attempt_201104190103_0002_r_000442_0' done.

Hope this bug to be fixed ASAP.

> Calls from running tasks to TaskTracker methods sometimes fail and incur a 
> 60s timeout
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5380
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Matei Zaharia
>         Attachments: HADOOP-5380.Y.20.branch.patch, 
> HADOOP_5380-Y.0.20.20x.patch
>
>
> I'm seeing some map tasks in my jobs take 1 minute to commit after they 
> finish the map computation. On the map side, the output looks like this:
> <code>
> 2009-03-02 21:30:54,384 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Cannot 
> initialize JVM Metrics with processName=MAP, sessionId= - already initialized
> 2009-03-02 21:30:54,437 INFO org.apache.hadoop.mapred.MapTask: 
> numReduceTasks: 800
> 2009-03-02 21:30:54,437 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 
> 300
> 2009-03-02 21:30:55,493 INFO org.apache.hadoop.mapred.MapTask: data buffer = 
> 239075328/298844160
> 2009-03-02 21:30:55,494 INFO org.apache.hadoop.mapred.MapTask: record buffer 
> = 786432/983040
> 2009-03-02 21:31:00,381 INFO org.apache.hadoop.mapred.MapTask: Starting flush 
> of map output
> 2009-03-02 21:31:07,892 INFO org.apache.hadoop.mapred.MapTask: Finished spill > 0
> 2009-03-02 21:31:07,951 INFO org.apache.hadoop.mapred.TaskRunner: 
> Task:attempt_200903022127_0001_m_003163_0 is done. And is in the process of 
> commiting
> 2009-03-02 21:32:07,949 INFO org.apache.hadoop.mapred.TaskRunner: 
> Communication exception: java.io.IOException: Call to /127.0.0.1:50311 failed 
> on local exception: java.nio.channels.ClosedChannelException
>       at org.apache.hadoop.ipc.Client.wrapException(Client.java:765)
>       at org.apache.hadoop.ipc.Client.call(Client.java:733)
>       at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>       at org.apache.hadoop.mapred.$Proxy0.ping(Unknown Source)
>       at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:525)
>       at java.lang.Thread.run(Thread.java:619)
> Caused by: java.nio.channels.ClosedChannelException
>       at 
> java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:167)
>       at 
> java.nio.channels.SelectableChannel.register(SelectableChannel.java:254)
>       at 
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:331)
>       at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
>       at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>       at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>       at java.io.FilterInputStream.read(FilterInputStream.java:116)
>       at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:276)
>       at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>       at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>       at java.io.DataInputStream.readInt(DataInputStream.java:370)
>       at 
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>       at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
> 2009-03-02 21:32:07,953 INFO org.apache.hadoop.mapred.TaskRunner: Task 
> 'attempt_200903022127_0001_m_003163_0' done.
> </code>
> In the TaskTracker log, it looks like this:
> <code>
> 2009-03-02 21:31:08,110 WARN org.apache.hadoop.ipc.Server: IPC Server 
> Responder, call ping(attempt_200903022127_0001_m_003163_0) from 
> 127.0.0.1:56884: output error
> 2009-03-02 21:31:08,111 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 10 on 50311 caught: java.nio.channels.ClosedChannelException
>     at 
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
>     at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)    at 
> org.apache.hadoop.ipc.Server.channelWrite(Server.java:1195)
>     at org.apache.hadoop.ipc.Server.access$1900(Server.java:77)
>     at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:613)
>     at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:677)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:981)
> </code>
> Note that the task actually seemed to commit - it didn't get speculatively 
> executed or anything. However, the job wasn't able to continue until this one 
> task was done. Both parties seem to think the channel was closed. How does 
> the channel get closed externally? If closing it from outside is unavoidable, 
> maybe the right thing to do is to set a much lower timeout, because 1 minute 
> delay can be pretty significant for a small job.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-5380) Calls from running tasks to TaskTracker methods sometimes fail and incur a 60s timeout

Reply via email to