Rushabh S Shah created HDFS-7704:
------------------------------------

             Summary: DN heartbeat to Active NN may be blocked and expire if 
connection to Standby NN continues to time out. 
                 Key: HDFS-7704
                 URL: https://issues.apache.org/jira/browse/HDFS-7704
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode, namenode
    Affects Versions: 2.5.0
            Reporter: Rushabh S Shah
            Assignee: Rushabh S Shah


There are couple of synchronous calls in BPOfferservice (i.e reportBadBlocks 
and trySendErrorReport) which will wait for both of the actor threads to 
process this calls.
This calls are made with writeLock acquired.
When reportBadBlocks() is blocked at the RPC layer due to unreachable NN, 
subsequent heartbeat response processing has to wait for the write lock. It 
eventually gets through, but takes too long and it blocks the next heartbeat.
In our HA cluster setup, the standby namenode was taking a long time to process 
the request.
Requesting improvement in datanode to make the above calls asynchronous since 
these reports don't have any specific
deadlines, so extra few seconds of delay should be acceptable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to