[
https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896092#comment-13896092
]
Wilfred Spiegelenburg commented on HDFS-4858:
---------------------------------------------
>I thought the client timeout problem was solved by HDFS-4646. It's been
>working fine for us since then. Is it a different problem you are talking
>about?
There are other parts of the code that use the ipc client. It is not just
limited to the datanode. The tasktracker, as mentioned is one example for MR1.
The fact that the client does not timeout for a write is still there. You now
need to find every single place where you use the client and make sure that you
fix it. I tried to point that out.
Instead of pulling the time out for writes out of the client code (Client.java)
it should be handled transparently for the caller in the Client code.
>> see around line 600
>Line 600 of what?
Line 600 of the Client.java code, If rpcTimeout is not set (0 or -1) then the
pingInterval will be used to make sure that the read times out. A similar
construct should also be used for the write side to make sure that you do not
have rpcTimeout set. The solution maintains a different behaviour for read and
write timeouts while this would be a chance to fix that.
>> It is also used by the TaskTracker
> Are you referring to cdh4, as trunk and Hadoop 2 don't have TaskTracker.
The TaskTracker was used as an example. Purely used to point to the fact that
there are, or could be, more users of the client code that exhibit the same
issue.
>Sorry, I didn't understand the general problem from your descriptions, guys.
>If you could tell how to reproduce it or better propose a fix. The DataNode
>problem is real though, because fail-overs fail when the DN thread gets stuck.
>It would be good to fix it in the next release.
The issue is real for each user of the client code. Write on a socket does not
time out (normal behaviour). When a hardware failure occurs on a network layer
this becomes apparent (unplug a cable, server HW failure, switch failure etc).
Steps to reproduce are given in this bug and in HDFS-4646. They all work. TCP
timeout and retry are the underlying cause of this 'hang' and faster fails can
be achieved by changing the tcp_retries2 parameter at the kernel level.
Instead of looking outside the Client.java code why do we not time out the
write itself? Currently the code just waits forever for the write to return, a
simple change would be to change that wait. I can attach a diff for the change
if that makes it easier to see what I mean.
> HDFS DataNode to NameNode RPC should timeout
> --------------------------------------------
>
> Key: HDFS-4858
> URL: https://issues.apache.org/jira/browse/HDFS-4858
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha
> Environment: Redhat/CentOS 6.4 64 bit Linux
> Reporter: Jagane Sundar
> Assignee: Konstantin Boudnik
> Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HDFS-4858.patch, HDFS-4858.patch
>
>
> The DataNode is configured with ipc.client.ping false and ipc.ping.interval
> 14000. This configuration means that the IPC Client (DataNode, in this case)
> should timeout in 14000 seconds if the Standby NameNode does not respond to a
> sendHeartbeat.
> What we observe is this: If the Standby NameNode happens to reboot for any
> reason, the DataNodes that are heartbeating to this Standby get stuck forever
> while trying to sendHeartbeat. See Stack trace included below. When the
> Standby NameNode comes back up, we find that the DataNode never re-registers
> with the Standby NameNode. Thereafter failover completely fails.
> The desired behavior is that the DataNode's sendHeartbeat should timeout in
> 14 seconds, and keep retrying till the Standby NameNode comes back up. When
> it does, the DataNode should reconnect, re-register, and offer service.
> Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the
> method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to
> create the DatanodeProtocolPB object.
> Stack trace of thread stuck in the DataNode after the Standby NN has rebooted:
> Thread 25 (DataNode: [file:///opt/hadoop/data] heartbeating to
> vmhost6-vm1/10.10.10.151:8020):
> State: WAITING
> Blocked count: 23843
> Waited count: 45676
> Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5
> Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:485)
> org.apache.hadoop.ipc.Client.call(Client.java:1220)
>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
> sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> java.lang.reflect.Method.invoke(Method.java:597)
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> sun.proxy.$Proxy10.sendHeartbeat(Unknown Source)
>
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167)
>
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445)
>
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
>
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
> java.lang.Thread.run(Thread.java:662)
> DataNode RPC to Standby NameNode never times out.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)