[
https://issues.apache.org/jira/browse/HDFS-14642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892843#comment-16892843
]
Wei-Chiu Chuang commented on HDFS-14642:
----------------------------------------
HDFS-14053 didn't land in 3.2.0 actually. So this fix will stay in the trunk
only.
> processMisReplicatedBlocks does not return correct processed count
> ------------------------------------------------------------------
>
> Key: HDFS-14642
> URL: https://issues.apache.org/jira/browse/HDFS-14642
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 3.2.0
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14642.001.patch
>
>
> HDFS-14053 introduced a method "processMisReplicatedBlocks" to the
> blockManager, and it is used by fsck to schedule mis-replicated blocks for
> replication.
> The method should return a the number of blocks it processed, but it always
> returns zero as "processed" is never incremented in the method.
> It should also drop and re-take the write lock every "numBlocksPerIteration"
> but as processed is never incremented, it will never drop and re-take the
> write lock, giving potential for holding the write lock for a long time.
> {code:java}
> public int processMisReplicatedBlocks(List<BlockInfo> blocks) {
> int processed = 0;
> Iterator<BlockInfo> iter = blocks.iterator();
> try {
> while (isPopulatingReplQueues() && namesystem.isRunning()
> && !Thread.currentThread().isInterrupted()
> && iter.hasNext()) {
> int limit = processed + numBlocksPerIteration;
> namesystem.writeLockInterruptibly();
> try {
> while (iter.hasNext() && processed < limit) {
> BlockInfo blk = iter.next();
> MisReplicationResult r = processMisReplicatedBlock(blk);
> LOG.debug("BLOCK* processMisReplicatedBlocks: " +
> "Re-scanned block {}, result is {}", blk, r);
> }
> } finally {
> namesystem.writeUnlock();
> }
> }
> } catch (InterruptedException ex) {
> LOG.info("Caught InterruptedException while scheduling replication work" +
> " for mis-replicated blocks");
> Thread.currentThread().interrupt();
> }
> return processed;
> }{code}
> Due to this, fsck causes a warning to be logged in the NN for every
> mis-replicated file it schedules replication for, as it checks the processed
> count:
> {code:java}
> 2019-07-10 15:46:14,790 WARN namenode.NameNode: Fsck: Block manager is able
> to process only 0 mis-replicated blocks (Total count : 1 ) for path /...{code}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]