[ https://issues.apache.org/jira/browse/HADOOP-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649087#action_12649087 ]
Steve Loughran commented on HADOOP-4659: ---------------------------------------- I'm going to put a merged patch up, but although the RPC test is passing, the spinning appears to be creating deadlock in TestFileCreationClient; relevant bits of the thread dump to follow. 1. We're sleeping here holding [EMAIL PROTECTED] [junit] "DataStreamer for file /wrwelkj/file9 block blk_-4298389317957709021_1010" id=133 idx=0x210 tid=25976 prio=5 alive, in native, sleeping, native_waiting, daemon [junit] at java/lang/Thread.sleep(J)V(Native Method) [junit] at org/apache/hadoop/ipc/Client$Connection.handleConnectionFailure(Client.java:373) [junit] at org/apache/hadoop/ipc/Client$Connection.setupIOstreams(Client.java:310) [junit] ^-- Holding lock: org/apache/hadoop/ipc/[EMAIL PROTECTED] lock] [junit] at org/apache/hadoop/ipc/Client$Connection.access$1700(Client.java:177) [junit] at org/apache/hadoop/ipc/Client.getConnection(Client.java:791) [junit] at org/apache/hadoop/ipc/Client.call(Client.java:697) [junit] at org/apache/hadoop/ipc/RPC$Invoker.invoke(RPC.java:216) [junit] at $Proxy7.getProtocolVersion(Ljava/lang/String;J)J(Unknown Source) [junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:340) [junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:327) [junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:364) [junit] at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:299) [junit] at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:286) 2. Which is blocking this [junit] -- Blocked trying to get lock: org/apache/hadoop/ipc/[EMAIL PROTECTED] lock] [junit] at jrockit/vm/Threads.sleep(I)V(Native Method) [junit] at jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)[optimized] [junit] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)[optimized] [junit] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)[optimized] [junit] at org/apache/hadoop/ipc/Client$Connection.addCall(Client.java:219) [junit] at org/apache/hadoop/ipc/Client$Connection.access$1600(Client.java:177) [junit] at org/apache/hadoop/ipc/Client.getConnection(Client.java:785) [junit] at org/apache/hadoop/ipc/Client.call(Client.java:697) [junit] at org/apache/hadoop/ipc/RPC$Invoker.invoke(RPC.java:216) [junit] at $Proxy7.getProtocolVersion(Ljava/lang/String;J)J(Unknown Source) [junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:340) [junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:327) [junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:364) [junit] at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:299) [junit] at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:286) [junit] at org/apache/hadoop/hdfs/DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:141) [junit] at org/apache/hadoop/hdfs/DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2469) [junit] at org/apache/hadoop/hdfs/DFSClient$DFSOutputStream.access$1700(DFSClient.java:1997) [junit] at org/apache/hadoop/hdfs/DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) and this [junit] -- Blocked trying to get lock: org/apache/hadoop/ipc/[EMAIL PROTECTED] lock] [junit] at jrockit/vm/Threads.sleep(I)V(Native Method) [junit] at jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)[optimized] [junit] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)[optimized] [junit] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)[optimized] [junit] at org/apache/hadoop/ipc/Client$Connection.addCall(Client.java:219) [junit] at org/apache/hadoop/ipc/Client$Connection.access$1600(Client.java:177) [junit] at org/apache/hadoop/ipc/Client.getConnection(Client.java:785) [junit] at org/apache/hadoop/ipc/Client.call(Client.java:697) [junit] at org/apache/hadoop/ipc/RPC$Invoker.invoke(RPC.java:216) [junit] at $Proxy7.getProtocolVersion(Ljava/lang/String;J)J(Unknown Source) [junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:340) [junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:327) [junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:364) [junit] at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:299) [junit] at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:286) [junit] at org/apache/hadoop/hdfs/DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:141) [junit] at org/apache/hadoop/hdfs/DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2469) [junit] at org/apache/hadoop/hdfs/DFSClient$DFSOutputStream.access$1700(DFSClient.java:1997) [junit] at org/apache/hadoop/hdfs/DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) [junit] ^-- Holding lock: java/util/[EMAIL PROTECTED] lock] [junit] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method) [junit] -- end of trace [junit] "DataStreamer for file /wrwelkj/file5 block blk_7479178383257153500_1010" id=127 and this idx=0x200 tid=25971 prio=5 alive, in native, blocked, daemon [junit] -- Blocked trying to get lock: org/apache/hadoop/ipc/[EMAIL PROTECTED] lock] [junit] at jrockit/vm/Threads.sleep(I)V(Native Method) [junit] at jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)[optimized] [junit] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)[optimized] [junit] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)[optimized] [junit] at org/apache/hadoop/ipc/Client$Connection.addCall(Client.java:219) [junit] at org/apache/hadoop/ipc/Client$Connection.access$1600(Client.java:177) [junit] at org/apache/hadoop/ipc/Client.getConnection(Client.java:785) [junit] at org/apache/hadoop/ipc/Client.call(Client.java:697) [junit] at org/apache/hadoop/ipc/RPC$Invoker.invoke(RPC.java:216) [junit] at $Proxy7.getProtocolVersion(Ljava/lang/String;J)J(Unknown Source) [junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:340) [junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:327) [junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:364) [junit] at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:299) [junit] at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:286) [junit] at org/apache/hadoop/hdfs/DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:141) [junit] at org/apache/hadoop/hdfs/DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2469) [junit] at org/apache/hadoop/hdfs/DFSClient$DFSOutputStream.access$1700(DFSClient.java:1997) [junit] at org/apache/hadoop/hdfs/DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160) [junit] ^-- Holding lock: java/util/[EMAIL PROTECTED] lock] [junit] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method) So: the sleep in setupIOStreams appears to be blocking the other operations. for some reason, <junit> isn't timing out or killing the process, which implies this is fairly serious. > Root cause of connection failure is being lost to code that uses it for > delaying startup > ---------------------------------------------------------------------------------------- > > Key: HADOOP-4659 > URL: https://issues.apache.org/jira/browse/HADOOP-4659 > Project: Hadoop Core > Issue Type: Bug > Components: ipc > Affects Versions: 0.18.3 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Blocker > Fix For: 0.18.3 > > Attachments: connectRetry.patch, hadoop-4659.patch, rpcConn.patch > > > ipc.Client the root cause of a connection failure is being lost as the > exception is wrapped, hence the outside code, the one that looks for that > root cause, isn't working as expected. The results is you can't bring up a > task tracker before job tracker, and probably the same for a datanode before > a namenode. The change that triggered this is not yet located, I had thought > it was HADOOP-3844 but I no longer believe this is the case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.