[
https://issues.apache.org/jira/browse/HDFS-14852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992362#comment-16992362
]
Stephen O'Donnell commented on HDFS-14852:
------------------------------------------
Thinking on this issue a little more, I believe it makes sense to delete the
block from all queues if a match is not found in queue for the passed in
'LEVEL'. It simplifies the code a little and makes the intent of the method
easier to understand.
My reasoning is:
1) For deletes, the majority of the time, there will be nothing in the
lowRedundancyQueue, so right now, it will iterate over all queues. In the rare
case, it will stop after finding an entry. Optimising this by first deleting
from CORRUPT (1 search) and then maybe finding something in the other 4 queues
(average 2 searches) will likely result in searching all queues most of the
time anyway.
2) For non-deletes, the calls to remove pass the LEVEL which is usually
correct, except in rare circumstances. Therefore it will get an exact match on
the suggested queue and not iterate any queues, but in the rare case when LEVEL
is not correct, we save little by stopping the search early.
Therefore I think something like this would work well with basically the same
performance as the existing code:
{code}
boolean remove(BlockInfo block, int priLevel, int oldExpectedReplicas) {
if(priLevel >= 0 && priLevel < LEVEL
&& priorityQueues.get(priLevel).remove(block)) {
NameNode.blockStateChangeLog.debug(
"BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block {}"
+ " from priority queue {}",
block, priLevel);
decrementBlockStat(block, priLevel, oldExpectedReplicas);
return true;
} else {
// Try to remove the block from all queues if the block was
// not found in the queue for the given priority level.
boolean found = false;
for (int i = 0; i < LEVEL; i++) {
if (i != priLevel && priorityQueues.get(i).remove(block)) {
NameNode.blockStateChangeLog.debug(
"BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block" +
" {} from priority queue {}", block, i);
decrementBlockStat(block, i, oldExpectedReplicas);
found = true;
}
}
return found;
}
}
{code}
Adding the other change to blockManager makes sense too:
{code}
- if (bi == null) {
+ if (bi == null || bi.isDeleted()) {
{code}
> Remove of LowRedundancyBlocks do NOT remove the block from all queues
> ---------------------------------------------------------------------
>
> Key: HDFS-14852
> URL: https://issues.apache.org/jira/browse/HDFS-14852
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 3.2.0, 3.0.3, 3.1.2, 3.3.0
> Reporter: Fei Hui
> Assignee: Fei Hui
> Priority: Major
> Attachments: CorruptBlocksMismatch.png, HDFS-14852.001.patch,
> HDFS-14852.002.patch, HDFS-14852.003.patch, HDFS-14852.004.patch,
> HDFS-14852.005.patch, screenshot-1.png
>
>
> LowRedundancyBlocks.java
> {code:java}
> // Some comments here
> if(priLevel >= 0 && priLevel < LEVEL
> && priorityQueues.get(priLevel).remove(block)) {
> NameNode.blockStateChangeLog.debug(
> "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block {}"
> + " from priority queue {}",
> block, priLevel);
> decrementBlockStat(block, priLevel, oldExpectedReplicas);
> return true;
> } else {
> // Try to remove the block from all queues if the block was
> // not found in the queue for the given priority level.
> for (int i = 0; i < LEVEL; i++) {
> if (i != priLevel && priorityQueues.get(i).remove(block)) {
> NameNode.blockStateChangeLog.debug(
> "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block" +
> " {} from priority queue {}", block, i);
> decrementBlockStat(block, i, oldExpectedReplicas);
> return true;
> }
> }
> }
> return false;
> }
> {code}
> Source code is above, the comments as follow
> {quote}
> // Try to remove the block from all queues if the block was
> // not found in the queue for the given priority level.
> {quote}
> The function "remove" does NOT remove the block from all queues.
> Function add from LowRedundancyBlocks.java is used on some places and maybe
> one block in two or more queues.
> We found that corrupt blocks mismatch corrupt files on NN web UI. Maybe it is
> related to this.
> Upload initial patch
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]