[ 
https://issues.apache.org/jira/browse/HDFS-16989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18031939#comment-18031939
 ] 

ASF GitHub Bot commented on HDFS-16989:
---------------------------------------

github-actions[bot] commented on PR #5593:
URL: https://github.com/apache/hadoop/pull/5593#issuecomment-3430009182

   We're closing this stale PR because it has been open for 100 days with no 
activity. This isn't a judgement on the merit of the PR in any way. It's just a 
way of keeping the PR queue manageable.
   If you feel like this was a mistake, or you would like to continue working 
on it, please feel free to re-open it and ask for a committer to remove the 
stale tag and review again.
   Thanks all for your contribution.




> Large scale block transfer causes too many excess blocks
> --------------------------------------------------------
>
>                 Key: HDFS-16989
>                 URL: https://issues.apache.org/jira/browse/HDFS-16989
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.4.0, 3.3.5
>            Reporter: farmmamba
>            Priority: Critical
>              Labels: pull-request-available
>
> Recently, we change the replication factor of a directory which has 1.6PB  
> from 2 to 3.  There are 76 million  blocks in this directory. After we 
> execute the setrep cmd, the active namenode prints lots of logs like below:
> {code:java}
> PendingReconstructionMonitor timed out blk_xxxx_260285131{code}
> and many datanodes prints lots of duplicated logs like below: 
> {code:java}
> 2023-04-21 13:58:17,627 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(1.1.1.1:50010, 
> datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075, 
> infoSecurePort=0, ipcPort=8010, 
> storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting 
> thread to transfer 
> BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 to 2.2.2.2:50010 
> 2023-04-21 14:21:21,296 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataTransfer, at 1.1.1.1:50010: Transmitted 
> BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 
> (numBytes=524384907) to /2.2.2.2:50010
> 14:34:19,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(1.1.1.1:50010, 
> datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075, 
> infoSecurePort=0, ipcPort=8010, 
> storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting 
> thread to transfer 
> BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 to 2.2.2.2:50010
> 2023-04-21 14:37:58,207 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(1.1.1.1:50010, 
> datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075, 
> infoSecurePort=0, ipcPort=8010, 
> storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting 
> thread to transfer 
> BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 to 2.2.2.2:50010
> 14:40:46,817 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(1.1.1.1:50010, 
> datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075, 
> infoSecurePort=0, ipcPort=8010, 
> storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting 
> thread to transfer 
> BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 to 2.2.2.2:50010
>  {code}
> It is funny that an same block blk_1333463885_260285131 was transfer multiple 
> times even though it has been transmitted successfully. the excess transfer 
> request will trigger lots of ReplicaAlreadyExistsException in target 
> datanode, because the replica has already been transmitted and the state of 
> the replica is FINALIZED.
>  
> The reason is in   RedundancyMonitor#processPendingReconstructions and 
> BlcokManager#validateReconstructionWork.
> 1、RedundancyMonitor#computeDatanodeWork() generates transfer tasks using 
> neededReconstruction and addTaskToDatanode, and put the tasks into 
> pendingReconstruction.
> 2、We set *dfs.namenode.replication.work.multiplier.per.iteration = 200,*  the 
> specific cluster has 400 datanodes. So, RedundancyMonitor may generate 80K 
> block transfer tasks per iteration at most.  After 
> dfs.namenode.reconstruction.pending.timeout-sec (5min), the requests in  
> pendingReconstruction will time-out, and  PendingReconstructionMonitor thread 
> will put the time-out requests into timedOutItems.
> 3、RedundancyMonitor#processPendingReconstructions() will put the requests in 
> timedOutItems into neededReconstruction again.
> 4、TimeUnit.MILLISECONDS.sleep(redundancyRecheckIntervalMs);
> 5、next iteration of while loop,RedundancyMonitor#computeDatanodeWork() will 
> generates transfer tasks using neededReconstruction again,  here will 
> generate repeated task or different target node task(due to the chooseTarget 
> method).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to