[jira] [Updated] (HDFS-16989) Large scale block transfer causes too many excess blocks

farmmamba (Jira) Sun, 23 Apr 2023 08:37:37 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-16989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


farmmamba updated HDFS-16989:
-----------------------------
    Description: 
Recently, we change the replication factor of a directory which has 1.6PB  from 
2 to 3.  There are 76 million  blocks in this directory. After we execute the 
setrep cmd, the active namenode prints lots of logs like below:
{code:java}
PendingReconstructionMonitor timed out blk_xxxx_260285131{code}
and many datanodes prints lots of duplicated logs like below: 
{code:java}
2023-04-21 13:58:17,627 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(1.1.1.1:50010, 
datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075, 
infoSecurePort=0, ipcPort=8010, 
storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting 
thread to transfer BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 
to 2.2.2.2:50010 
2023-04-21 14:21:21,296 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at 1.1.1.1:50010: Transmitted 
BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 
(numBytes=524384907) to /2.2.2.2:50010
14:34:19,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(1.1.1.1:50010, 
datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075, 
infoSecurePort=0, ipcPort=8010, 
storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting 
thread to transfer BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 
to 2.2.2.2:50010
2023-04-21 14:37:58,207 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(1.1.1.1:50010, 
datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075, 
infoSecurePort=0, ipcPort=8010, 
storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting 
thread to transfer BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 
to 2.2.2.2:50010
14:40:46,817 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(1.1.1.1:50010, 
datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075, 
infoSecurePort=0, ipcPort=8010, 
storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting 
thread to transfer BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 
to 2.2.2.2:50010
 {code}
It is funny that an same block blk_1333463885_260285131 was transfer multiple 
times even though it has been transmitted successfully. the excess transfer 
request will trigger lots of ReplicaAlreadyExistsException in target datanode, 
because the replica has already been transmitted and the state of the replica 
is FINALIZED.

 

The reason is in   RedundancyMonitor#processPendingReconstructions and 
BlcokManager#validateReconstructionWork.

 

It can be reproduced by ...

  was:
Recently, we change the replication factor of a directory which has 1.6PB  from 
2 to 3.  There are 76 million  blocks in this directory. After we execute the 
setrep cmd, the active namenode prints lot of logs like below:
{code:java}
PendingReconstructionMonitor timed out blk_xxxx_260285131{code}


> Large scale block transfer causes too many excess blocks
> --------------------------------------------------------
>
>                 Key: HDFS-16989
>                 URL: https://issues.apache.org/jira/browse/HDFS-16989
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.4.0, 3.3.5
>            Reporter: farmmamba
>            Priority: Critical
>
> Recently, we change the replication factor of a directory which has 1.6PB  
> from 2 to 3.  There are 76 million  blocks in this directory. After we 
> execute the setrep cmd, the active namenode prints lots of logs like below:
> {code:java}
> PendingReconstructionMonitor timed out blk_xxxx_260285131{code}
> and many datanodes prints lots of duplicated logs like below: 
> {code:java}
> 2023-04-21 13:58:17,627 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(1.1.1.1:50010, 
> datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075, 
> infoSecurePort=0, ipcPort=8010, 
> storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting 
> thread to transfer 
> BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 to 2.2.2.2:50010 
> 2023-04-21 14:21:21,296 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataTransfer, at 1.1.1.1:50010: Transmitted 
> BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 
> (numBytes=524384907) to /2.2.2.2:50010
> 14:34:19,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(1.1.1.1:50010, 
> datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075, 
> infoSecurePort=0, ipcPort=8010, 
> storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting 
> thread to transfer 
> BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 to 2.2.2.2:50010
> 2023-04-21 14:37:58,207 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(1.1.1.1:50010, 
> datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075, 
> infoSecurePort=0, ipcPort=8010, 
> storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting 
> thread to transfer 
> BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 to 2.2.2.2:50010
> 14:40:46,817 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(1.1.1.1:50010, 
> datanodeUuid=f3081eac-983f-4c3f-99c8-e4830640ee90, infoPort=50075, 
> infoSecurePort=0, ipcPort=8010, 
> storageInfo=lv=-57;cid=yj-hdfs2;nsid=1882889931;c=1667291826362) Starting 
> thread to transfer 
> BP-578784987-x.x.x.x-1667291826362:blk_1333463885_260285131 to 2.2.2.2:50010
>  {code}
> It is funny that an same block blk_1333463885_260285131 was transfer multiple 
> times even though it has been transmitted successfully. the excess transfer 
> request will trigger lots of ReplicaAlreadyExistsException in target 
> datanode, because the replica has already been transmitted and the state of 
> the replica is FINALIZED.
>  
> The reason is in   RedundancyMonitor#processPendingReconstructions and 
> BlcokManager#validateReconstructionWork.
>  
> It can be reproduced by ...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-16989) Large scale block transfer causes too many excess blocks

Reply via email to