[jira] [Comment Edited] (HDFS-14849) Erasure Coding: replicate block infinitely when datanode being decommissioning

Fei Hui (Jira) Thu, 19 Sep 2019 08:39:58 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933506#comment-16933506
 ]


Fei Hui edited comment on HDFS-14849 at 9/19/19 3:26 PM:
---------------------------------------------------------

[~marvelrock] Thanks
I deep into the code and find the workflow
# RedundancyMonitor -> computeDatanodeWork -> computeBlockReconstructionWork -> 
get blocks from *neededReconstruction* -> 
computeReconstructionWorkForBlocks(put block to pendingReconstruction in 
validateReconstructionWork) -> scheduleReconstruction -> ErasureCodingWork
# RedundancyMonitor -> processPendingReconstructions -> get timedOutItems from 
pendingReconstruction ->  add it to neededReconstruction 

I think it will replicate more blocks without your fix. After all of blocks 
from decommissioning nodes replicating successfully, the block will not add to 
neededReconstruction because it will recompute live replicas in 
processPendingReconstructions function.  Replicating block infinitely  does not 
happen, is it right ? Just some blocks are more than one?


was (Author: ferhui):
[~marvelrock] Thanks
I deep into the code and find the workflow
# RedundancyMonitor -> computeDatanodeWork -> computeBlockReconstructionWork -> 
get blocks from *neededReconstruction* -> 
computeReconstructionWorkForBlocks(put block to pendingReconstruction in 
validateReconstructionWork) -> scheduleReconstruction -> ErasureCodingWork
# RedundancyMonitor -> processPendingReconstructions -> get timedOutItems from 
pendingReconstruction ->  add it to neededReconstruction 
I think it will replicate more blocks without your fix. After all of blocks 
from decommissioning nodes replicating successfully, the block will not add to 
neededReconstruction because it will recompute live replicas in 
processPendingReconstructions function.  Replicating block infinitely  does not 
happen, is it right ? Just some blocks are more than one?

> Erasure Coding: replicate block infinitely when datanode being decommissioning
> ------------------------------------------------------------------------------
>
>                 Key: HDFS-14849
>                 URL: https://issues.apache.org/jira/browse/HDFS-14849
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.3.0
>            Reporter: HuangTao
>            Assignee: HuangTao
>            Priority: Major
>              Labels: EC, HDFS, NameNode
>         Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch, 
> fsck-file.png, liveBlockIndices.png, scheduleReconstruction.png
>
>
> When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC block in 
> that datanode will be replicated infinitely.
> // added 2019/09/19
> I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes 
> simultaneously. 
>  !scheduleReconstruction.png! 
>  !fsck-file.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14849) Erasure Coding: replicate block infinitely when datanode being decommissioning

Reply via email to