Daryn Sharp created HDFS-8674:
---------------------------------

             Summary: Improve performance of postponed block scans
                 Key: HDFS-8674
                 URL: https://issues.apache.org/jira/browse/HDFS-8674
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: HDFS
    Affects Versions: 2.6.0
            Reporter: Daryn Sharp
            Assignee: Daryn Sharp
            Priority: Critical


When a standby goes active, it marks all nodes as "stale" which will cause 
block invalidations for over-replicated blocks to be queued until full block 
reports are received from the nodes with the block.  The replication monitor 
scans the queue with O(N) runtime.  It picks a random offset and iterates 
through the set to randomize blocks scanned.

The result is devastating when a cluster loses multiple nodes during a rolling 
upgrade. Re-replication occurs, the nodes come back, the excess block 
invalidations are postponed. Rescanning just 2k blocks out of millions of 
postponed blocks may take multiple seconds. During the scan, the write lock is 
held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to