[jira] [Comment Edited] (HDFS-14920) Erasure Coding: Decommission may hang If one or more datanodes are out of service during decommission

Fei Hui (Jira) Mon, 28 Oct 2019 00:24:47 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16960795#comment-16960795
 ]


Fei Hui edited comment on HDFS-14920 at 10/28/19 7:23 AM:
----------------------------------------------------------

[~ayushtkn] Thanks for your review
{code}
        // Sub decommissioning because the index replica is live.
        if (decommissioningBitSet.get(blockIndex)) {
          counters.subtract(StoredReplicaState.DECOMMISSIONING, 1);
        } else {
          decommissioningBitSet.set(blockIndex);
        }
{code}
We set the *blockIndex* internal block. Because having enter if clause as bellow
{code}
 if (state == StoredReplicaState.LIVE) {
{code}
If the *blockIndex* internal block is in live state, this block in  other 
storages should not be decommissioning while we compute live and 
decommissioning replicas. The *blockIndex* internal block will be live or 
decommissioning, it could not be both live and decommissioning.


was (Author: ferhui):
[~ayushtkn] Thanks for your review
{code}
        // Sub decommissioning because the index replica is live.
        if (decommissioningBitSet.get(blockIndex)) {
          counters.subtract(StoredReplicaState.DECOMMISSIONING, 1);
        } else {
          decommissioningBitSet.set(blockIndex);
        }
{code}
We set the *blockIndex* internal block. Because having enter if clause as bellow
{code}
 if (state == StoredReplicaState.LIVE) {
{code}
If the *blockIndex* internal block is in live state, other storages contains 
this internal block should be decommissioning while we compute live and 
decommissioning replicas. The *blockIndex* internal block will be live or 
decommissioning, it could not be both live and decommissioning.

> Erasure Coding: Decommission may hang If one or more datanodes are out of 
> service during decommission  
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-14920
>                 URL: https://issues.apache.org/jira/browse/HDFS-14920
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ec
>    Affects Versions: 3.0.3, 3.2.1, 3.1.3
>            Reporter: Fei Hui
>            Assignee: Fei Hui
>            Priority: Major
>         Attachments: HDFS-14920.001.patch, HDFS-14920.002.patch
>
>
> Decommission test hangs in our clusters.
> Have seen the messages as follow
> {quote}
> 2019-10-22 15:58:51,514 TRACE 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Block 
> blk_-9223372035600425840_372987973 numExpected=9, numLive=5
> 2019-10-22 15:58:51,514 INFO BlockStateChange: Block: 
> blk_-9223372035600425840_372987973, Expected Replicas: 9, live replicas: 5, 
> corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 4, 
> maintenance replicas: 0, live entering maintenance replicas: 0, excess 
> replicas: 0, Is Open File: false, Datanodes having this block: 
> 10.255.43.57:50010 10.255.53.12:50010 10.255.63.12:50010 10.255.62.39:50010 
> 10.255.37.36:50010 10.255.33.15:50010 10.255.69.29:50010 10.255.51.13:50010 
> 10.255.64.15:50010 , Current Datanode: 10.255.69.29:50010, Is current 
> datanode decommissioning: true, Is current datanode entering maintenance: 
> false
> 2019-10-22 15:58:51,514 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Node 
> 10.255.69.29:50010 still has 1 blocks to replicate before it is a candidate 
> to finish Decommission In Progress
> {quote}
> After digging the source code and cluster log,  guess it happens as follow 
> steps.
> # Storage strategy is RS-6-3-1024k.
> # EC block b consists of b0, b1, b2, b3, b4, b5, b6, b7, b8, b0 is from 
> datanode dn0, b1 is from datanode dn1, ...etc
> # At the beginning dn0 is in decommission progress, b0 is replicated 
> successfully, and dn0 is staill in decommission progress.
> # Later b1, b2, b3 in decommission progress, and dn4 containing b4 is out of 
> service, so need to reconstruct, and create ErasureCodingWork to do it, in 
> the ErasureCodingWork, additionalReplRequired is 4
> # Because hasAllInternalBlocks is false, Will call 
> ErasureCodingWork#addTaskToDatanode -> 
> DatanodeDescriptor#addBlockToBeErasureCoded, and send 
> BlockECReconstructionInfo task to Datanode
> # DataNode can not reconstruction the block because targets is 4, greater 
> than 3( parity number).
> There is a problem as follow, from BlockManager.java#scheduleReconstruction
> {code}
>       // should reconstruct all the internal blocks before scheduling
>       // replication task for decommissioning node(s).
>       if (additionalReplRequired - numReplicas.decommissioning() -
>           numReplicas.liveEnteringMaintenanceReplicas() > 0) {
>         additionalReplRequired = additionalReplRequired -
>             numReplicas.decommissioning() -
>             numReplicas.liveEnteringMaintenanceReplicas();
>       }
> {code}
> Should reconstruction firstly and then replicate for decommissioning. Because 
> numReplicas.decommissioning() is 4, and additionalReplRequired is 4, that's 
> wrong,
> numReplicas.decommissioning() should be 3, it should exclude live replica. 
> If so, additionalReplRequired will be 1, reconstruction will schedule as 
> expected. After that, decommission goes on.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-14920) Erasure Coding: Decommission may hang If one or more datanodes are out of service during decommission

Reply via email to