[ 
https://issues.apache.org/jira/browse/HDFS-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135372#comment-15135372
 ] 

Yongjun Zhang commented on HDFS-9600:
-------------------------------------

HI [~yangzhe1991], [~szetszwo] and [~vinayrpet],

Thanks for your earlier work here. I noticed that in branch-2.6 

{code}
  /**
   * Return true if there are any blocks on this node that have not
   * yet reached their replication factor. Otherwise returns false.
   */
  boolean isReplicationInProgress(DatanodeDescriptor srcNode) {
    boolean status = false;
    boolean firstReplicationLog = true;
    int underReplicatedBlocks = 0;
    int decommissionOnlyReplicas = 0;
    int underReplicatedInOpenFiles = 0;
    final Iterator<? extends Block> it = srcNode.getBlockIterator();
    while(it.hasNext()) {
      final Block block = it.next();
      BlockCollection bc = blocksMap.getBlockCollection(block);

      if (bc != null) {
        NumberReplicas num = countNodes(block);
        int curReplicas = num.liveReplicas();
        int curExpectedReplicas = getReplication(block);

        if (curReplicas < curExpectedReplicas
            || !isPlacementPolicySatisfied(block)) {
{code}

And clause {{ blockInfo.isComplete()}} is added to the following method but not 
in the above method, Would any of you please explain why 
{{blockInfo.isComplete()}}  is not needed in the above code? Thanks a lot.

{code}
  boolean isNeededReplication(Block b, int expected, int current) {
    BlockInfo blockInfo;
    if (b instanceof BlockInfo) {
      blockInfo = (BlockInfo) b;
    } else {
      blockInfo = getStoredBlock(b);
    }
    return blockInfo.isComplete()
        && (current < expected || !isPlacementPolicySatisfied(b));
  }
{code}



> do not check replication if the block is under construction
> -----------------------------------------------------------
>
>                 Key: HDFS-9600
>                 URL: https://issues.apache.org/jira/browse/HDFS-9600
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>            Priority: Critical
>             Fix For: 2.8.0, 2.7.3, 2.6.4
>
>         Attachments: HDFS-9600-branch-2.6.patch, HDFS-9600-branch-2.7.patch, 
> HDFS-9600-branch-2.patch, HDFS-9600-v1.patch, HDFS-9600-v2.patch, 
> HDFS-9600-v3.patch, HDFS-9600-v4.patch
>
>
> When appending a file, we will update pipeline to bump a new GS and the old 
> GS will be considered as out of date. When changing GS, in 
> BlockInfo.setGenerationStampAndVerifyReplicas we will remove replicas having 
> old GS which means we will remove all replicas because no DN has new GS until 
> the block with new GS is added to blockMaps again by 
> DatanodeProtocol.blockReceivedAndDeleted.
> If we check replication of this block before it is added back, it will be 
> regarded as missing. The probability is low but if there are decommissioning 
> nodes the DecommissionManager.Monitor will scan all blocks belongs to 
> decommissioning nodes with a very fast speed so the probability of finding 
> missing block is very high but actually they are not missing. 
> Furthermore, after closing the appended file, in 
> FSNamesystem.finalizeINodeFileUnderConstruction, it will checkReplication. If 
> some of nodes are decommissioning, this block with new GS will be added to 
> UnderReplicatedBlocks map so there are two blocks with same ID in this map, 
> one is in QUEUE_WITH_CORRUPT_BLOCKS and the other is in 
> QUEUE_HIGHEST_PRIORITY or QUEUE_UNDER_REPLICATED. And there will be many 
> missing blocks warning in NameNode website but there is no corrupt files...
> Therefore, I think the solution is we should not check replication if the 
> block is under construction. We only check complete blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to