[
https://issues.apache.org/jira/browse/HDFS-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961968#comment-16961968
]
Ayush Saxena commented on HDFS-14920:
-------------------------------------
If I go by
{code:java}
But note we exclude duplicated internal block replicas
* for calculating {@link NumberReplicas#liveReplicas}.
{code}
* For decommissioning also, Then its fine, but you need to add a line here too,
If a node is Live and decommissioning....(Similar you added for the Decom enum)
explaining that.
*Change to Lambdas for tests.
* Change in setting {{decommissioningBitSet.set(blockIndex);}} as I posted
above, if possible. As I feel decommissioningBitSet should contain the bits
actually decomissioning
Apart seems fair enough.
> Erasure Coding: Decommission may hang If one or more datanodes are out of
> service during decommission
> -------------------------------------------------------------------------------------------------------
>
> Key: HDFS-14920
> URL: https://issues.apache.org/jira/browse/HDFS-14920
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ec
> Affects Versions: 3.0.3, 3.2.1, 3.1.3
> Reporter: Fei Hui
> Assignee: Fei Hui
> Priority: Major
> Attachments: HDFS-14920.001.patch, HDFS-14920.002.patch,
> HDFS-14920.003.patch
>
>
> Decommission test hangs in our clusters.
> Have seen the messages as follow
> {quote}
> 2019-10-22 15:58:51,514 TRACE
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Block
> blk_-9223372035600425840_372987973 numExpected=9, numLive=5
> 2019-10-22 15:58:51,514 INFO BlockStateChange: Block:
> blk_-9223372035600425840_372987973, Expected Replicas: 9, live replicas: 5,
> corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 4,
> maintenance replicas: 0, live entering maintenance replicas: 0, excess
> replicas: 0, Is Open File: false, Datanodes having this block:
> 10.255.43.57:50010 10.255.53.12:50010 10.255.63.12:50010 10.255.62.39:50010
> 10.255.37.36:50010 10.255.33.15:50010 10.255.69.29:50010 10.255.51.13:50010
> 10.255.64.15:50010 , Current Datanode: 10.255.69.29:50010, Is current
> datanode decommissioning: true, Is current datanode entering maintenance:
> false
> 2019-10-22 15:58:51,514 DEBUG
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Node
> 10.255.69.29:50010 still has 1 blocks to replicate before it is a candidate
> to finish Decommission In Progress
> {quote}
> After digging the source code and cluster log, guess it happens as follow
> steps.
> # Storage strategy is RS-6-3-1024k.
> # EC block b consists of b0, b1, b2, b3, b4, b5, b6, b7, b8, b0 is from
> datanode dn0, b1 is from datanode dn1, ...etc
> # At the beginning dn0 is in decommission progress, b0 is replicated
> successfully, and dn0 is staill in decommission progress.
> # Later b1, b2, b3 in decommission progress, and dn4 containing b4 is out of
> service, so need to reconstruct, and create ErasureCodingWork to do it, in
> the ErasureCodingWork, additionalReplRequired is 4
> # Because hasAllInternalBlocks is false, Will call
> ErasureCodingWork#addTaskToDatanode ->
> DatanodeDescriptor#addBlockToBeErasureCoded, and send
> BlockECReconstructionInfo task to Datanode
> # DataNode can not reconstruction the block because targets is 4, greater
> than 3( parity number).
> There is a problem as follow, from BlockManager.java#scheduleReconstruction
> {code}
> // should reconstruct all the internal blocks before scheduling
> // replication task for decommissioning node(s).
> if (additionalReplRequired - numReplicas.decommissioning() -
> numReplicas.liveEnteringMaintenanceReplicas() > 0) {
> additionalReplRequired = additionalReplRequired -
> numReplicas.decommissioning() -
> numReplicas.liveEnteringMaintenanceReplicas();
> }
> {code}
> Should reconstruction firstly and then replicate for decommissioning. Because
> numReplicas.decommissioning() is 4, and additionalReplRequired is 4, that's
> wrong,
> numReplicas.decommissioning() should be 3, it should exclude live replica.
> If so, additionalReplRequired will be 1, reconstruction will schedule as
> expected. After that, decommission goes on.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]