[
https://issues.apache.org/jira/browse/HDFS-14852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992025#comment-16992025
]
Stephen O'Donnell commented on HDFS-14852:
------------------------------------------
Looking at the original code, in BlockManager.removeBlock(), it does not guess
the level to remove from, it always passes LowRedundancyBlocks.LEVEL.
{code}
neededReconstruction.remove(block, LowRedundancyBlocks.LEVEL);
{code}
This means the first part of the method is never executed, and it will always
iterate all the queues until it finds an entry to remove:
{code}
if(priLevel >= 0 && priLevel < LEVEL // Never executed on block delete as
priLevel == LEVEL
&& priorityQueues.get(priLevel).remove(block)) {
...
return true;
} else {
// Try to remove the block from all queues if the block was
// not found in the queue for the given priority level.
for (int i = 0; i < LEVEL; i++) {
if (i != priLevel && priorityQueues.get(i).remove(block)) {
NameNode.blockStateChangeLog.debug(
"BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block" +
" {} from priority queue {}", block, i);
decrementBlockStat(block, i, oldExpectedReplicas);
return true;
}
}
}
return false;
}
{code}
In the most common case, for a delete of a given block, there will be no
reference in the lowRedundancyQueue (most blocks are perfectly replicated), but
based on the above, it has always been checking all 5 queues the majority of
the time, so I wonder if the performance concern of deleting all queues is as
bad as we think.
I wonder if the call to remove I mentioned above should always have been:
{code}
neededReconstruction.remove(block, LowRedundancyBlocks.LEVEL - 1);
{code}
That way it would always attempt to delete from the corrupt list and if it gets
nothing, try the other queues. If something is left behind in the other queues
it would get deleted anyway later by the redundancy monitor.
Other calls to neededReconstruction.remove() pass a priority, but that is
because those calls know the queue the block was taken from (they don't really
guess / calculate the priority, they just know where it came from), but as the
write lock is dropped after getting the list of blocks the block could be moved
to another level:
{code}
int computeBlockReconstructionWork(int blocksToProcess) {
List<List<BlockInfo>> blocksToReconstruct = null;
namesystem.writeLock();
try {
// Choose the blocks to be reconstructed
blocksToReconstruct = neededReconstruction
.chooseLowRedundancyBlocks(blocksToProcess);
} finally {
namesystem.writeUnlock();
}
return computeReconstructionWorkForBlocks(blocksToReconstruct);
}
{code}
I need to check the 005 patch a bit more tomorrow and think on this a bit more.
Based on my logic above, where the common case for deletes already checks all
unless it finds a match, and the other cases pass a priority which is almost
always correct, and rarely iterator the queues, I do wonder if simply deleting
all queues is the simplest solution.
> Remove of LowRedundancyBlocks do NOT remove the block from all queues
> ---------------------------------------------------------------------
>
> Key: HDFS-14852
> URL: https://issues.apache.org/jira/browse/HDFS-14852
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 3.2.0, 3.0.3, 3.1.2, 3.3.0
> Reporter: Fei Hui
> Assignee: Fei Hui
> Priority: Major
> Attachments: CorruptBlocksMismatch.png, HDFS-14852.001.patch,
> HDFS-14852.002.patch, HDFS-14852.003.patch, HDFS-14852.004.patch,
> HDFS-14852.005.patch, screenshot-1.png
>
>
> LowRedundancyBlocks.java
> {code:java}
> // Some comments here
> if(priLevel >= 0 && priLevel < LEVEL
> && priorityQueues.get(priLevel).remove(block)) {
> NameNode.blockStateChangeLog.debug(
> "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block {}"
> + " from priority queue {}",
> block, priLevel);
> decrementBlockStat(block, priLevel, oldExpectedReplicas);
> return true;
> } else {
> // Try to remove the block from all queues if the block was
> // not found in the queue for the given priority level.
> for (int i = 0; i < LEVEL; i++) {
> if (i != priLevel && priorityQueues.get(i).remove(block)) {
> NameNode.blockStateChangeLog.debug(
> "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block" +
> " {} from priority queue {}", block, i);
> decrementBlockStat(block, i, oldExpectedReplicas);
> return true;
> }
> }
> }
> return false;
> }
> {code}
> Source code is above, the comments as follow
> {quote}
> // Try to remove the block from all queues if the block was
> // not found in the queue for the given priority level.
> {quote}
> The function "remove" does NOT remove the block from all queues.
> Function add from LowRedundancyBlocks.java is used on some places and maybe
> one block in two or more queues.
> We found that corrupt blocks mismatch corrupt files on NN web UI. Maybe it is
> related to this.
> Upload initial patch
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]