Daryn Sharp created HDFS-5947: --------------------------------- Summary: Improve dead node detection and handling Key: HDFS-5947 URL: https://issues.apache.org/jira/browse/HDFS-5947 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp
When {{HeartbeatManager.heartbeatCheck}} runs: # All DNs are scanned to count dead nodes # Processes the first dead node # If there was a dead node, loops to re-scan all DNs again Processing the dead node holds the namesystem write lock while removing the node from the blockmap. It also appears to do a lot of work to immediately re-adjust the replication queues. All this processing might be too expensive while holding the write lock, ex. if a rack or two is lost. -- This message was sent by Atlassian JIRA (v6.1.5#6160)