[ 
https://issues.apache.org/jira/browse/HDFS-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136419#comment-15136419
 ] 

Andrew Wang commented on HDFS-9600:
-----------------------------------

bq. 1. If the block is not complete, especially if the block is being written 
to right now, we seem to be able to still decommission this node 
(isReplicationInProgress returns false). That may be ok for replication factor 
bigger than 1 (let the remaining replicas to carry on the ongoing write), but 
if it's 1, then we would lost the replica, and the block. Isn't that a problem?

For open files, we only let it decom if the UC block stays above minReplication 
(default 1). Note the {{curReplicas > minReplication}} check.

bq. 2. If the block is complete, and the replication factor is 1, similarly, 
the isReplicationInProgress method will return false and we are still able to 
decommission the node.

Note that the {{liveReplicas}} from the {{countNodes}} function does not 
include decommissioning nodes. So for this case {{curReplicas}} will be 0, less 
than the {{curExpectedReplicas}} of 1. The way {{status}} is set is pretty 
confusing in this function.

One question I have is where we add blocks to {{neededReplications}} at the end 
of {{isReplicationInProgress}}. We should be checking that the block is 
complete here right?

> do not check replication if the block is under construction
> -----------------------------------------------------------
>
>                 Key: HDFS-9600
>                 URL: https://issues.apache.org/jira/browse/HDFS-9600
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>            Priority: Critical
>             Fix For: 2.8.0, 2.7.3, 2.6.4
>
>         Attachments: HDFS-9600-branch-2.6.patch, HDFS-9600-branch-2.7.patch, 
> HDFS-9600-branch-2.patch, HDFS-9600-v1.patch, HDFS-9600-v2.patch, 
> HDFS-9600-v3.patch, HDFS-9600-v4.patch
>
>
> When appending a file, we will update pipeline to bump a new GS and the old 
> GS will be considered as out of date. When changing GS, in 
> BlockInfo.setGenerationStampAndVerifyReplicas we will remove replicas having 
> old GS which means we will remove all replicas because no DN has new GS until 
> the block with new GS is added to blockMaps again by 
> DatanodeProtocol.blockReceivedAndDeleted.
> If we check replication of this block before it is added back, it will be 
> regarded as missing. The probability is low but if there are decommissioning 
> nodes the DecommissionManager.Monitor will scan all blocks belongs to 
> decommissioning nodes with a very fast speed so the probability of finding 
> missing block is very high but actually they are not missing. 
> Furthermore, after closing the appended file, in 
> FSNamesystem.finalizeINodeFileUnderConstruction, it will checkReplication. If 
> some of nodes are decommissioning, this block with new GS will be added to 
> UnderReplicatedBlocks map so there are two blocks with same ID in this map, 
> one is in QUEUE_WITH_CORRUPT_BLOCKS and the other is in 
> QUEUE_HIGHEST_PRIORITY or QUEUE_UNDER_REPLICATED. And there will be many 
> missing blocks warning in NameNode website but there is no corrupt files...
> Therefore, I think the solution is we should not check replication if the 
> block is under construction. We only check complete blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to