[
https://issues.apache.org/jira/browse/HDFS-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072685#comment-15072685
]
Tsz Wo Nicholas Sze commented on HDFS-9600:
-------------------------------------------
Thanks for filing this bug and providing a patch. You are right that we should
not check replication if the block is under construction. Good catch!
For the patch, I think it should check if getUnderConstructionFeature() == null
instead of isComplete(). Do you agree?
> do not check replication if the block is under construction
> -----------------------------------------------------------
>
> Key: HDFS-9600
> URL: https://issues.apache.org/jira/browse/HDFS-9600
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Phil Yang
> Assignee: Phil Yang
> Priority: Critical
> Attachments: HDFS-9600-v1.patch, HDFS-9600-v2.patch
>
>
> When appending a file, we will update pipeline to bump a new GS and the old
> GS will be considered as out of date. When changing GS, in
> BlockInfo.setGenerationStampAndVerifyReplicas we will remove replicas having
> old GS which means we will remove all replicas because no DN has new GS until
> the block with new GS is added to blockMaps again by
> DatanodeProtocol.blockReceivedAndDeleted.
> If we check replication of this block before it is added back, it will be
> regarded as missing. The probability is low but if there are decommissioning
> nodes the DecommissionManager.Monitor will scan all blocks belongs to
> decommissioning nodes with a very fast speed so the probability of finding
> missing block is very high but actually they are not missing.
> Furthermore, after closing the appended file, in
> FSNamesystem.finalizeINodeFileUnderConstruction, it will checkReplication. If
> some of nodes are decommissioning, this block with new GS will be added to
> UnderReplicatedBlocks map so there are two blocks with same ID in this map,
> one is in QUEUE_WITH_CORRUPT_BLOCKS and the other is in
> QUEUE_HIGHEST_PRIORITY or QUEUE_UNDER_REPLICATED. And there will be many
> missing blocks warning in NameNode website but there is no corrupt files...
> Therefore, I think the solution is we should not check replication if the
> block is under construction. We only check complete blocks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)