[ 
https://issues.apache.org/jira/browse/HDFS-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805895#comment-17805895
 ] 

ASF GitHub Bot commented on HDFS-17036:
---------------------------------------

LiuGuH closed pull request #5710: HDFS-17036. Limit pendingReconstruction size
URL: https://github.com/apache/hadoop/pull/5710




> Limit pendingReconstruction size
> --------------------------------
>
>                 Key: HDFS-17036
>                 URL: https://issues.apache.org/jira/browse/HDFS-17036
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 3.4.0
>            Reporter: liuguanghua
>            Assignee: liuguanghua
>            Priority: Major
>              Labels: pull-request-available
>
> Consider the following scenario:
>  (1) The hdfs cluster is big enough. 
> Namenode Config: 
> DFS_NAMENODE_REPLICATION_WORK_MULTIPLIER_PER_ITERATION ->200
> DFS_NAMENODE_REPLICATION_MAX_STREAMS_KEY->100
> DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_KEY->200
> Datanode :
> It has large disk on one datanode.  One datanode has millions or even tens of 
> millions blocks.  Or maybe more in archive cluster.
> (2) The cluster administrator may do one of the following actions:
> 2.1 Increase replication on one large dir ,eg 2PB 
> 2.2 Decommissioning  some nodes from cluster
> (3)  When the cluster datanode is on heavy read and write.
>  
> Thus, datanode receive transfer commands form heartbeat whit namenode. When 
> datanode is on heavily  load (write and read), the 
> SumOfActorCommandQueueLength metris will increase to 1k+  because of slow 
> consumption with lock competition.  See details code : 
> BPServiceActor.CommandProcessingThread
>  
> And Namenode  distribute all replicationWork to datanode through heartbeat.  
> All LowerReconstruction blocks will in neededReconstruction.  Every heartbeat 
> will distribute 
> liveDatnaodes*DFS_NAMENODE_REPLICATION_WORK_MULTIPLIER_PER_ITERATION to the 
> related datanodes.  But as mentioned above,the datanode blocked executing the 
> transfer block commands.  When 5min later by default ,  the 
> pendingReconstruction will into timeout and then re-entry 
> neededReconstruction, and then re-entry pendingReconstruction . So the 
> datanodes will receive multiple transfer command with same block.  That will 
> lead to excess blocks.
> Continuous cycle,it will lead to serious consequences.  Datanode has more 
> tranfers commands in queue and waste more disk space  and result in more 
> excess blocks.
>  
> So, I think we should limit the max size of pendingReconstruction.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to