Surendra Singh Lilhore created HDFS-9659:
--------------------------------------------

             Summary: EditLogTailerThread to Active Namenode RPC should timeout
                 Key: HDFS-9659
                 URL: https://issues.apache.org/jira/browse/HDFS-9659
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: ha, namenode
    Affects Versions: 3.0.0
            Reporter: Surendra Singh Lilhore
            Assignee: Surendra Singh Lilhore
            Priority: Critical


{{EditLogTailerThread}} to Active {{Namenode}} RPC doesn't have timeout and 
it’s removed in HDFS-6440.

When inject the disk slow and consume system IO to the active name node, the 
nameservice can't switch and this is because SNN not able to stop 
{{EditLogTailerThread}}.

*Thread dump from SNN*
{noformat}
"IPC Server handler 33 on 25000" #118 daemon prio=5 os_prio=0 
tid=0x00007f2384409800 nid=0x26c89 in Object.wait() [0x00007f2376ac7000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1245)
        - locked <0x00000006d517f538> (a 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread)
        at java.lang.Thread.join(Thread.java:1319)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.stop(EditLogTailer.java:183)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.stopStandbyServices(FSNamesystem.java:1284)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.stopStandbyServices(NameNode.java:1852)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.exitState(StandbyState.java:72)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:62)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1684)
{noformat}

*Thread dump for {{EditLogTailerThread}}*, it is stuck in 
{{NamenodeProtocolTranslatorPB.rollEditLog()}} rpc call.
{noformat}
"Edit log tailer" #150 prio=5 os_prio=0 tid=0x00007f2395569800 nid=0x26cac in 
Object.wait() [0x00007f2374aa7000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at org.apache.hadoop.ipc.Client.call(Client.java:1503)
        - locked <0x00000006d581bb90> (a org.apache.hadoop.ipc.Client$Call)
        at org.apache.hadoop.ipc.Client.call(Client.java:1448)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
        at com.sun.proxy.$Proxy16.rollEditLog(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:148)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$2.doWork(EditLogTailer.java:301)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$2.doWork(EditLogTailer.java:298)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$MultipleNameNodeProxy.call(EditLogTailer.java:420)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to