[jira] [Commented] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks

Hiroyuki Adachi (Jira) Tue, 07 Jun 2022 04:20:07 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550940#comment-17550940
 ]


Hiroyuki Adachi commented on HDFS-16613:
----------------------------------------

[~caozhiqiang] , thank you for your explanation. It looks good.

Now I understand that the blocksToProcess controls the number of replication 
works, so if it is less than dfs.namenode.replication.max-streams-hard-limit, 
all blocks use replication on decommissioning node but not reconstruction.

Could you please tell me the value of 
dfs.namenode.replication.max-streams-hard-limit and 
dfs.namenode.replication.work.multiplier.per.iteration?

 
{code:java}
// BlockManager#computeDatanodeWork

final int blocksToProcess = numlive
    * this.blocksReplWorkMultiplier;
final int nodesToProcess = (int) Math.ceil(numlive
    * this.blocksInvalidateWorkPct);

int workFound = this.computeBlockReconstructionWork(blocksToProcess); {code}
 

> EC: Improve performance of decommissioning dn with many ec blocks
> -----------------------------------------------------------------
>
>                 Key: HDFS-16613
>                 URL: https://issues.apache.org/jira/browse/HDFS-16613
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: ec, erasure-coding, namenode
>    Affects Versions: 3.4.0
>            Reporter: caozhiqiang
>            Assignee: caozhiqiang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2022-06-07-11-46-42-389.png, 
> image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, 
> image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. 
> The reason is unlike replication blocks can be replicated from any dn which 
> has the same block replication, the ec block have to be replicated from the 
> decommissioning dn.
> The configurations dfs.namenode.replication.max-streams and 
> dfs.namenode.replication.max-streams-hard-limit will limit the replication 
> speed, but increase these configurations will create risk to the whole 
> cluster's network. So it should add a new configuration to limit the 
> decommissioning dn, distinguished from the cluster wide max-streams limit.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks

Reply via email to