[
https://issues.apache.org/jira/browse/HDFS-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Phil Yang updated HDFS-9600:
----------------------------
Status: Patch Available (was: Open)
> do not check replication if the block is under construction
> -----------------------------------------------------------
>
> Key: HDFS-9600
> URL: https://issues.apache.org/jira/browse/HDFS-9600
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Phil Yang
> Assignee: Phil Yang
> Priority: Critical
> Attachments: HDFS-9600-v1.patch
>
>
> When appending a file, we will update pipeline to bump a new GS and the old
> GS will be considered as out of date. When changing GS, in
> BlockInfo.setGenerationStampAndVerifyReplicas we will remove replicas having
> old GS which means we will remove all replicas because no DN has new GS until
> the block with new GS is added to blockMaps again by
> DatanodeProtocol.blockReceivedAndDeleted.
> If we check replication of this block before it is added back, it will be
> regarded as missing. The probability is low but if there are decommissioning
> nodes the DecommissionManager.Monitor will scan all blocks belongs to
> decommissioning nodes with a very fast speed so the probability of finding
> missing block is very high and actually they are not missing.
> Furthermore, after closing the appended file, in
> FSNamesystem.finalizeINodeFileUnderConstruction, it will checkReplication and
> because of some of nodes is decommissioning, this block with new GS will be
> added to UnderReplicatedBlocks map so there are two blocks with same ID in
> this map, one is in QUEUE_WITH_CORRUPT_BLOCKS and the other is in
> QUEUE_HIGHEST_PRIORITY or QUEUE_UNDER_REPLICATED. And there will be many
> missing blocks warning in NameNode website but there is no corrupt files...
> Therefore, I think the solution is we should not check replication if the
> block is under construction. We only check complete blocks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)