Daryn Sharp created HDFS-5947:
---------------------------------
Summary: Improve dead node detection and handling
Key: HDFS-5947
URL: https://issues.apache.org/jira/browse/HDFS-5947
Project: Hadoop HDFS
Issue Type: Improvement
Components: namenode
Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0
Reporter: Daryn Sharp
When {{HeartbeatManager.heartbeatCheck}} runs:
# All DNs are scanned to count dead nodes
# Processes the first dead node
# If there was a dead node, loops to re-scan all DNs again
Processing the dead node holds the namesystem write lock while removing the
node from the blockmap. It also appears to do a lot of work to immediately
re-adjust the replication queues. All this processing might be too expensive
while holding the write lock, ex. if a rack or two is lost.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)