[ 
https://issues.apache.org/jira/browse/HDFS-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Yang updated HDFS-9600:
----------------------------
    Attachment: HDFS-9600-branch-2.6.patch
                HDFS-9600-branch-2.7.patch
                HDFS-9600-branch-2.patch

upload patches for branch-2, branch-2.7 and branch 2.6.

In branch-2, the LocatedBlock returned by namenode.updateBlockForPipeline has 
no StorageIDs in it so I use the old StorageIDs in testcase.

In branch-2.7, the parameter of isNeededReplication is Block, so we need get 
BlockInfoContiguous first for isComplete().

In branch-2.6, we use BlockInfo just like 2.7's BlockInfoContiguous. And 
BlockManager.isNeededReplication is private that is not visible for testcase. 
So I change it to package visibility by removing private keyword.

> do not check replication if the block is under construction
> -----------------------------------------------------------
>
>                 Key: HDFS-9600
>                 URL: https://issues.apache.org/jira/browse/HDFS-9600
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>            Priority: Critical
>         Attachments: HDFS-9600-branch-2.6.patch, HDFS-9600-branch-2.7.patch, 
> HDFS-9600-branch-2.patch, HDFS-9600-v1.patch, HDFS-9600-v2.patch, 
> HDFS-9600-v3.patch, HDFS-9600-v4.patch
>
>
> When appending a file, we will update pipeline to bump a new GS and the old 
> GS will be considered as out of date. When changing GS, in 
> BlockInfo.setGenerationStampAndVerifyReplicas we will remove replicas having 
> old GS which means we will remove all replicas because no DN has new GS until 
> the block with new GS is added to blockMaps again by 
> DatanodeProtocol.blockReceivedAndDeleted.
> If we check replication of this block before it is added back, it will be 
> regarded as missing. The probability is low but if there are decommissioning 
> nodes the DecommissionManager.Monitor will scan all blocks belongs to 
> decommissioning nodes with a very fast speed so the probability of finding 
> missing block is very high but actually they are not missing. 
> Furthermore, after closing the appended file, in 
> FSNamesystem.finalizeINodeFileUnderConstruction, it will checkReplication. If 
> some of nodes are decommissioning, this block with new GS will be added to 
> UnderReplicatedBlocks map so there are two blocks with same ID in this map, 
> one is in QUEUE_WITH_CORRUPT_BLOCKS and the other is in 
> QUEUE_HIGHEST_PRIORITY or QUEUE_UNDER_REPLICATED. And there will be many 
> missing blocks warning in NameNode website but there is no corrupt files...
> Therefore, I think the solution is we should not check replication if the 
> block is under construction. We only check complete blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to