[jira] [Comment Edited] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks

caozhiqiang (Jira) Mon, 06 Jun 2022 20:47:07 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550782#comment-17550782
 ]


caozhiqiang edited comment on HDFS-16613 at 6/7/22 3:46 AM:
------------------------------------------------------------

In my cluster tests, the following optimizations would maximize the IO 
performance of the decommissioning DN. And the time spend by decommissioning a 
DN reduced from 3 hours to half an hour.
 # Add this patch
 # Increase the value of dfs.namenode.replication.max-streams-hard-limit
 # Decrease the value of dfs.namenode.reconstruction.pending.timeout-sec to 
shorten the time interval for checking pendingReconstructions.

!image-2022-06-07-11-46-42-389.png|width=552,height=165!


was (Author: caozhiqiang):
In my cluster tests, the following optimizations would maximize the IO 
performance of the decommissioning DN. And the time spend by decommissioning a 
DN reduced from 3 hours to half an hour.
 # Add this patch
 # Increase the value of dfs.namenode.replication.max-streams-hard-limit
 # Decrease the value of dfs.namenode.reconstruction.pending.timeout-sec to 
shorten the time interval for checking pendingReconstructions.

> EC: Improve performance of decommissioning dn with many ec blocks
> -----------------------------------------------------------------
>
>                 Key: HDFS-16613
>                 URL: https://issues.apache.org/jira/browse/HDFS-16613
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: ec, erasure-coding, namenode
>    Affects Versions: 3.4.0
>            Reporter: caozhiqiang
>            Assignee: caozhiqiang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2022-06-07-11-46-42-389.png
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. 
> The reason is unlike replication blocks can be replicated from any dn which 
> has the same block replication, the ec block have to be replicated from the 
> decommissioning dn.
> The configurations dfs.namenode.replication.max-streams and 
> dfs.namenode.replication.max-streams-hard-limit will limit the replication 
> speed, but increase these configurations will create risk to the whole 
> cluster's network. So it should add a new configuration to limit the 
> decommissioning dn, distinguished from the cluster wide max-streams limit.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks

Reply via email to