[
https://issues.apache.org/jira/browse/HDFS-15415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140779#comment-17140779
]
Stephen O'Donnell commented on HDFS-15415:
------------------------------------------
If a block is RBW or RUR before the snapshot of memory is taken, then it will
never be part of the in memory blocks for that pass of the scanner. RBW should
also be skipped by the disk scan too. If a block goes FINALIZED to RBW (due to
append), they we may records a difference or we may not depending on the
sequence of events.
There are always going to be some "false positives" in the comparison, as the
disk picture will always be changing before we take the lock, even with the
code as it is now. That is why I believe we can do the processing against the
memory snapshot without the lock. The price we pay, is possibly some more
differences which have to be reconciled later.
The faster we can make the scan step, the less false positives there will be
for reconcile later.
As with many of these locking problems, it is hard to be 100% sure this will
not cause some other problems, but from what I looked at today, I think it
should be good.
> Reduce locking in Datanode DirectoryScanner
> -------------------------------------------
>
> Key: HDFS-15415
> URL: https://issues.apache.org/jira/browse/HDFS-15415
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Affects Versions: 3.4.0
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
> Attachments: HDFS-15415.001.patch
>
>
> In HDFS-15406, we have a small change to greatly reduce the runtime and
> locking time of the datanode DirectoryScanner. They may be room for further
> improvement here:
> 1. These lines of code in DirectoryScanner#scan(), obtain a snapshot of the
> finalized blocks from memory, and then sort them, under the DN lock. However
> the blocks are stored in a sorted structure (FoldedTreeSet) and hence the
> sort should be unnecessary.
> {code}
> final List<ReplicaInfo> bl = dataset.getFinalizedBlocks(bpid);
> Collections.sort(bl); // Sort based on blockId
> {code}
> 2. From the scan step, we have captured a snapshot of what is on disk. After
> calling `dataset.getFinalizedBlocks(bpid);` as above we have taken a snapshot
> of in memory. The two snapshots are never 100% in sync as things are always
> changing as the disk is scanned.
> We are only comparing finalized blocks, so they should not really change:
> * If a block is deleted after our snapshot, our snapshot will not see it and
> that is OK.
> * A finalized block could be appended. If that happens both the genstamp and
> length will change, but that should be handled by reconcile when it calls
> `FSDatasetImpl.checkAndUpdate()`, and there is nothing stopping blocks being
> appended after they have been scanned from disk, but before they have been
> compared with memory.
> My suspicion is that we can do all the comparison work outside of the lock
> and checkAndUpdate() re-checks any differences later under the lock on a
> block by block basis.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]