[ 
https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551381#comment-17551381
 ] 

caozhiqiang commented on HDFS-16613:
------------------------------------

[~hadachi] , in my cluster, 
dfs.namenode.replication.max-streams-hard-limit=512, 
dfs.namenode.replication.work.multiplier.per.iteration=20.

The data process is below:
 # Choose the blocks to be reconstructed from neededReconstruction. This 
process use dfs.namenode.replication.work.multiplier.per.iteration to limit 
process number.
 # *Choose source datanode. This process use 
dfs.namenode.replication.max-streams-hard-limit to limit process number.*
 # Choose target datanode.
 # Add task to datanode.
 # The blocks to be replicated would put to pendingReconstruction. If blocks in 
pendingReconstruction timeout, they will be put back to neededReconstruction 
and continue process. *This process use 
dfs.namenode.reconstruction.pending.timeout-sec to limit time interval.*
 # *Send cmd to dn in heartbeat response. Use 
dfs.namenode.decommission.max-streams to limit task number original.*

Firstly, the process 1 doesn't have performance bottleneck.

Performance bottleneck is in process 2, 5 and 6. So we should increase the 
value of dfs.namenode.replication.max-streams-hard-limit and decrease the value 
of dfs.namenode.reconstruction.pending.timeout-sec{*}.{*} With process 6, we 
should use dfs.namenode.replication.max-streams-hard-limit to limit the task 
number.

 
{code:java}
// DatanodeManager::handleHeartbeat
      if (nodeinfo.isDecommissionInProgress()) {
        maxTransfers = blockManager.getReplicationStreamsHardLimit()
            - xmitsInProgress;
      } else {
        maxTransfers = blockManager.getMaxReplicationStreams()
            - xmitsInProgress;
      } {code}
The below graph with under replicated blocks and pending replicated blocks 
metrics monitor, which can show the performance bottleneck. A lot of blocks 
time out in pendingReconstruction and were put back to neededReconstruction 
repeatedly. The first graph is before optimization and the second is after 
optimization.

Please help to check this process, thank you.

 

!image-2022-06-08-11-41-11-127.png|width=932,height=190!

!image-2022-06-08-11-38-29-664.png|width=931,height=175!

> EC: Improve performance of decommissioning dn with many ec blocks
> -----------------------------------------------------------------
>
>                 Key: HDFS-16613
>                 URL: https://issues.apache.org/jira/browse/HDFS-16613
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: ec, erasure-coding, namenode
>    Affects Versions: 3.4.0
>            Reporter: caozhiqiang
>            Assignee: caozhiqiang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2022-06-07-11-46-42-389.png, 
> image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, 
> image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, 
> image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. 
> The reason is unlike replication blocks can be replicated from any dn which 
> has the same block replication, the ec block have to be replicated from the 
> decommissioning dn.
> The configurations dfs.namenode.replication.max-streams and 
> dfs.namenode.replication.max-streams-hard-limit will limit the replication 
> speed, but increase these configurations will create risk to the whole 
> cluster's network. So it should add a new configuration to limit the 
> decommissioning dn, distinguished from the cluster wide max-streams limit.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to