[ https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933506#comment-16933506 ]
Fei Hui edited comment on HDFS-14849 at 9/19/19 3:26 PM: --------------------------------------------------------- [~marvelrock] Thanks I deep into the code and find the workflow # RedundancyMonitor -> computeDatanodeWork -> computeBlockReconstructionWork -> get blocks from *neededReconstruction* -> computeReconstructionWorkForBlocks(put block to pendingReconstruction in validateReconstructionWork) -> scheduleReconstruction -> ErasureCodingWork # RedundancyMonitor -> processPendingReconstructions -> get timedOutItems from pendingReconstruction -> add it to neededReconstruction I think it will replicate more blocks without your fix. After all of blocks from decommissioning nodes replicating successfully, the block will not add to neededReconstruction because it will recompute live replicas in processPendingReconstructions function. Replicating block infinitely does not happen, is it right ? Just some blocks are more than one? was (Author: ferhui): [~marvelrock] Thanks I deep into the code and find the workflow # RedundancyMonitor -> computeDatanodeWork -> computeBlockReconstructionWork -> get blocks from *neededReconstruction* -> computeReconstructionWorkForBlocks(put block to pendingReconstruction in validateReconstructionWork) -> scheduleReconstruction -> ErasureCodingWork # RedundancyMonitor -> processPendingReconstructions -> get timedOutItems from pendingReconstruction -> add it to neededReconstruction I think it will replicate more blocks without your fix. After all of blocks from decommissioning nodes replicating successfully, the block will not add to neededReconstruction because it will recompute live replicas in processPendingReconstructions function. Replicating block infinitely does not happen, is it right ? Just some blocks are more than one? > Erasure Coding: replicate block infinitely when datanode being decommissioning > ------------------------------------------------------------------------------ > > Key: HDFS-14849 > URL: https://issues.apache.org/jira/browse/HDFS-14849 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 3.3.0 > Reporter: HuangTao > Assignee: HuangTao > Priority: Major > Labels: EC, HDFS, NameNode > Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch, > fsck-file.png, liveBlockIndices.png, scheduleReconstruction.png > > > When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC block in > that datanode will be replicated infinitely. > // added 2019/09/19 > I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes > simultaneously. > !scheduleReconstruction.png! > !fsck-file.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org