wudeyu created HDFS-16100: ----------------------------- Summary: HA: Improve performance of Standby node transition to Active Key: HDFS-16100 URL: https://issues.apache.org/jira/browse/HDFS-16100 Project: Hadoop HDFS Issue Type: Wish Components: namenode Reporter: wudeyu
pendingDNMessages in Standby is used to support process postponed block reports. Block reports in pendingDNMessages would be processed: # If GS of replica is in the future, Standby Node will process it when corresponding edit log(e.g add_block) is loaded. # If replica is corrupted, Standby Node will process it while it transfer to Active. # If DataNode is removed, corresponding of block reports will be removed in pendingDNMessages. Obviously, if num of corrupted replica grows, more time cost during transferring. In out situation, there're 60 millions block reports in pendingDNMessages before transfer. Processing block reports cost almost 7mins and it's killed by zkfc. The replica state of the most block reports is RBW with wrong GS(less than storedblock in Standby Node). In my opinion, Standby Node could ignore the block reports that replica state is RBW with wrong GS. Because Active node/DataNode will remove it later. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org