[ https://issues.apache.org/jira/browse/HDFS-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16960795#comment-16960795 ]
Fei Hui edited comment on HDFS-14920 at 10/28/19 7:23 AM: ---------------------------------------------------------- [~ayushtkn] Thanks for your review {code} // Sub decommissioning because the index replica is live. if (decommissioningBitSet.get(blockIndex)) { counters.subtract(StoredReplicaState.DECOMMISSIONING, 1); } else { decommissioningBitSet.set(blockIndex); } {code} We set the *blockIndex* internal block. Because having enter if clause as bellow {code} if (state == StoredReplicaState.LIVE) { {code} If the *blockIndex* internal block is in live state, this block in other storages should not be decommissioning while we compute live and decommissioning replicas. The *blockIndex* internal block will be live or decommissioning, it could not be both live and decommissioning. was (Author: ferhui): [~ayushtkn] Thanks for your review {code} // Sub decommissioning because the index replica is live. if (decommissioningBitSet.get(blockIndex)) { counters.subtract(StoredReplicaState.DECOMMISSIONING, 1); } else { decommissioningBitSet.set(blockIndex); } {code} We set the *blockIndex* internal block. Because having enter if clause as bellow {code} if (state == StoredReplicaState.LIVE) { {code} If the *blockIndex* internal block is in live state, other storages contains this internal block should be decommissioning while we compute live and decommissioning replicas. The *blockIndex* internal block will be live or decommissioning, it could not be both live and decommissioning. > Erasure Coding: Decommission may hang If one or more datanodes are out of > service during decommission > ------------------------------------------------------------------------------------------------------- > > Key: HDFS-14920 > URL: https://issues.apache.org/jira/browse/HDFS-14920 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec > Affects Versions: 3.0.3, 3.2.1, 3.1.3 > Reporter: Fei Hui > Assignee: Fei Hui > Priority: Major > Attachments: HDFS-14920.001.patch, HDFS-14920.002.patch > > > Decommission test hangs in our clusters. > Have seen the messages as follow > {quote} > 2019-10-22 15:58:51,514 TRACE > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Block > blk_-9223372035600425840_372987973 numExpected=9, numLive=5 > 2019-10-22 15:58:51,514 INFO BlockStateChange: Block: > blk_-9223372035600425840_372987973, Expected Replicas: 9, live replicas: 5, > corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 4, > maintenance replicas: 0, live entering maintenance replicas: 0, excess > replicas: 0, Is Open File: false, Datanodes having this block: > 10.255.43.57:50010 10.255.53.12:50010 10.255.63.12:50010 10.255.62.39:50010 > 10.255.37.36:50010 10.255.33.15:50010 10.255.69.29:50010 10.255.51.13:50010 > 10.255.64.15:50010 , Current Datanode: 10.255.69.29:50010, Is current > datanode decommissioning: true, Is current datanode entering maintenance: > false > 2019-10-22 15:58:51,514 DEBUG > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Node > 10.255.69.29:50010 still has 1 blocks to replicate before it is a candidate > to finish Decommission In Progress > {quote} > After digging the source code and cluster log, guess it happens as follow > steps. > # Storage strategy is RS-6-3-1024k. > # EC block b consists of b0, b1, b2, b3, b4, b5, b6, b7, b8, b0 is from > datanode dn0, b1 is from datanode dn1, ...etc > # At the beginning dn0 is in decommission progress, b0 is replicated > successfully, and dn0 is staill in decommission progress. > # Later b1, b2, b3 in decommission progress, and dn4 containing b4 is out of > service, so need to reconstruct, and create ErasureCodingWork to do it, in > the ErasureCodingWork, additionalReplRequired is 4 > # Because hasAllInternalBlocks is false, Will call > ErasureCodingWork#addTaskToDatanode -> > DatanodeDescriptor#addBlockToBeErasureCoded, and send > BlockECReconstructionInfo task to Datanode > # DataNode can not reconstruction the block because targets is 4, greater > than 3( parity number). > There is a problem as follow, from BlockManager.java#scheduleReconstruction > {code} > // should reconstruct all the internal blocks before scheduling > // replication task for decommissioning node(s). > if (additionalReplRequired - numReplicas.decommissioning() - > numReplicas.liveEnteringMaintenanceReplicas() > 0) { > additionalReplRequired = additionalReplRequired - > numReplicas.decommissioning() - > numReplicas.liveEnteringMaintenanceReplicas(); > } > {code} > Should reconstruction firstly and then replicate for decommissioning. Because > numReplicas.decommissioning() is 4, and additionalReplRequired is 4, that's > wrong, > numReplicas.decommissioning() should be 3, it should exclude live replica. > If so, additionalReplRequired will be 1, reconstruction will schedule as > expected. After that, decommission goes on. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org