[ 
https://issues.apache.org/jira/browse/HDFS-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui reassigned HDFS-14920:
------------------------------

    Assignee: Fei Hui

> Erasure Coding: Decommission may hang If one or more datanodes are out of 
> service during decommission  
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-14920
>                 URL: https://issues.apache.org/jira/browse/HDFS-14920
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ec
>    Affects Versions: 3.0.3, 3.2.1, 3.1.3
>            Reporter: Fei Hui
>            Assignee: Fei Hui
>            Priority: Major
>
> Decommission test hangs in our clusters.
> Have seen the messages as follow
> {quote}
> 2019-10-22 15:58:51,514 TRACE 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Block 
> blk_-9223372035600425840_372987973 numExpected=9, numLive=5
> 2019-10-22 15:58:51,514 INFO BlockStateChange: Block: 
> blk_-9223372035600425840_372987973, Expected Replicas: 9, live replicas: 5, 
> corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 4, 
> maintenance replicas: 0, live entering maintenance replicas: 0, excess 
> replicas: 0, Is Open File: false, Datanodes having this block: 
> 10.255.43.57:50010 10.255.53.12:50010 10.255.63.12:50010 10.255.62.39:50010 
> 10.255.37.36:50010 10.255.33.15:50010 10.255.69.29:50010 10.255.51.13:50010 
> 10.255.64.15:50010 , Current Datanode: 10.255.69.29:50010, Is current 
> datanode decommissioning: true, Is current datanode entering maintenance: 
> false
> 2019-10-22 15:58:51,514 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Node 
> 10.255.69.29:50010 still has 1 blocks to replicate before it is a candidate 
> to finish Decommission In Progress
> {quote}
> After digging the source code and cluster log,  guess it happens as follow 
> steps.
> # Storage strategy is RS-6-3-1024k.
> # EC block b consists of b0, b1, b2, b3, b4, b5, b6, b7, b8, b0 is from 
> datanode dn0, b1 is from datanode dn1, ...etc
> # At the beginning dn0 is in decommission progress, b0 is replicated 
> successfully, and dn0 is staill in decommission progress.
> # Later b1, b2, b3 in decommission progress, and dn4 containing b4 is out of 
> service, so need to reconstruct, and create ErasureCodingWork to do it, in 
> the ErasureCodingWork, additionalReplRequired is 4
> # Because hasAllInternalBlocks is false, Will call 
> ErasureCodingWork#addTaskToDatanode -> 
> DatanodeDescriptor#addBlockToBeErasureCoded, and send 
> BlockECReconstructionInfo task to Datanode
> # DataNode can not reconstruction the block because targets is 4, greater 
> than 3( parity number).
> There is a problem as follow, from BlockManager.java#scheduleReconstruction
> {code}
>       // should reconstruct all the internal blocks before scheduling
>       // replication task for decommissioning node(s).
>       if (additionalReplRequired - numReplicas.decommissioning() -
>           numReplicas.liveEnteringMaintenanceReplicas() > 0) {
>         additionalReplRequired = additionalReplRequired -
>             numReplicas.decommissioning() -
>             numReplicas.liveEnteringMaintenanceReplicas();
>       }
> {code}
> Should reconstruction firstly and then replicate for decommissioning. Because 
> numReplicas.decommissioning() is 4, and additionalReplRequired is 4, that's 
> wrong.
> numReplicas.decommissioning() should be 3, it should exclude live replica. If 
> so, additionalReplRequired will be 1, reconstruction will schedule as 
> expected. After that, decommission goes on.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to