[jira] [Comment Edited] (HDFS-14920) Erasure Coding: Decommission may hang If one or more datanodes are out of service during decommission

Fei Hui (Jira) Mon, 28 Oct 2019 00:59:23 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16960829#comment-16960829
 ]


Fei Hui edited comment on HDFS-14920 at 10/28/19 7:58 AM:
----------------------------------------------------------

{quote}
other storages contains this internal block should be decommissioning
{quote}
This comment is wrong, have modified it.
The function *countReplicasForStripedBlock* is used for recomputing the LIVE 
replica for the same internal block.
One case use it.
{code}
    // Count replicas on decommissioning nodes, as these will not be
    // decommissioned unless recovery/completing last block has finished
    NumberReplicas numReplicas = countNodes(lastBlock);
    int numUsableReplicas = numReplicas.liveReplicas() +
        numReplicas.decommissioning() +
        numReplicas.liveEnteringMaintenanceReplicas();
{code}
I think if the same internal block is contains liveReplicas, and it is also 
contains decommissioning replicas. 
numReplicas.liveReplicas() + numReplicas.decommissioning() will not make sense.
So I think the same internal block is ether in liveReplicas or in 
decommissioning replicas, but not both.


was (Author: ferhui):
{quote}
other storages contains this internal block should be decommissioning
{quote}
This comment is error, have modified it.
The function *countReplicasForStripedBlock* is used for recomputing the LIVE 
replica for the same internal block.
One case use it.
{code}
    // Count replicas on decommissioning nodes, as these will not be
    // decommissioned unless recovery/completing last block has finished
    NumberReplicas numReplicas = countNodes(lastBlock);
    int numUsableReplicas = numReplicas.liveReplicas() +
        numReplicas.decommissioning() +
        numReplicas.liveEnteringMaintenanceReplicas();
{code}
I think if the same internal block is contains liveReplicas, and it is also 
contains decommissioning replicas. 
numReplicas.liveReplicas() + numReplicas.decommissioning() will not make sense.
So I think the same internal block is ether in liveReplicas or in 
decommissioning replicas, but not both.

> Erasure Coding: Decommission may hang If one or more datanodes are out of 
> service during decommission  
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-14920
>                 URL: https://issues.apache.org/jira/browse/HDFS-14920
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ec
>    Affects Versions: 3.0.3, 3.2.1, 3.1.3
>            Reporter: Fei Hui
>            Assignee: Fei Hui
>            Priority: Major
>         Attachments: HDFS-14920.001.patch, HDFS-14920.002.patch
>
>
> Decommission test hangs in our clusters.
> Have seen the messages as follow
> {quote}
> 2019-10-22 15:58:51,514 TRACE 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Block 
> blk_-9223372035600425840_372987973 numExpected=9, numLive=5
> 2019-10-22 15:58:51,514 INFO BlockStateChange: Block: 
> blk_-9223372035600425840_372987973, Expected Replicas: 9, live replicas: 5, 
> corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 4, 
> maintenance replicas: 0, live entering maintenance replicas: 0, excess 
> replicas: 0, Is Open File: false, Datanodes having this block: 
> 10.255.43.57:50010 10.255.53.12:50010 10.255.63.12:50010 10.255.62.39:50010 
> 10.255.37.36:50010 10.255.33.15:50010 10.255.69.29:50010 10.255.51.13:50010 
> 10.255.64.15:50010 , Current Datanode: 10.255.69.29:50010, Is current 
> datanode decommissioning: true, Is current datanode entering maintenance: 
> false
> 2019-10-22 15:58:51,514 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Node 
> 10.255.69.29:50010 still has 1 blocks to replicate before it is a candidate 
> to finish Decommission In Progress
> {quote}
> After digging the source code and cluster log,  guess it happens as follow 
> steps.
> # Storage strategy is RS-6-3-1024k.
> # EC block b consists of b0, b1, b2, b3, b4, b5, b6, b7, b8, b0 is from 
> datanode dn0, b1 is from datanode dn1, ...etc
> # At the beginning dn0 is in decommission progress, b0 is replicated 
> successfully, and dn0 is staill in decommission progress.
> # Later b1, b2, b3 in decommission progress, and dn4 containing b4 is out of 
> service, so need to reconstruct, and create ErasureCodingWork to do it, in 
> the ErasureCodingWork, additionalReplRequired is 4
> # Because hasAllInternalBlocks is false, Will call 
> ErasureCodingWork#addTaskToDatanode -> 
> DatanodeDescriptor#addBlockToBeErasureCoded, and send 
> BlockECReconstructionInfo task to Datanode
> # DataNode can not reconstruction the block because targets is 4, greater 
> than 3( parity number).
> There is a problem as follow, from BlockManager.java#scheduleReconstruction
> {code}
>       // should reconstruct all the internal blocks before scheduling
>       // replication task for decommissioning node(s).
>       if (additionalReplRequired - numReplicas.decommissioning() -
>           numReplicas.liveEnteringMaintenanceReplicas() > 0) {
>         additionalReplRequired = additionalReplRequired -
>             numReplicas.decommissioning() -
>             numReplicas.liveEnteringMaintenanceReplicas();
>       }
> {code}
> Should reconstruction firstly and then replicate for decommissioning. Because 
> numReplicas.decommissioning() is 4, and additionalReplRequired is 4, that's 
> wrong,
> numReplicas.decommissioning() should be 3, it should exclude live replica. 
> If so, additionalReplRequired will be 1, reconstruction will schedule as 
> expected. After that, decommission goes on.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-14920) Erasure Coding: Decommission may hang If one or more datanodes are out of service during decommission

Reply via email to