[ 
https://issues.apache.org/jira/browse/HDFS-15610?focusedWorklogId=496239&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-496239
 ]

ASF GitHub Bot logged work on HDFS-15610:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Oct/20 23:57
            Start Date: 06/Oct/20 23:57
    Worklog Time Spent: 10m 
      Work Description: karthikhw opened a new pull request #2365:
URL: https://github.com/apache/hadoop/pull/2365


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HADOOP-XXXXX. Fix a typo in YYY.)
   For more details, please see 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 496239)
    Remaining Estimate: 0h
            Time Spent: 10m

> Reduce datanode upgrade/hardlink thread
> ---------------------------------------
>
>                 Key: HDFS-15610
>                 URL: https://issues.apache.org/jira/browse/HDFS-15610
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 3.0.0, 3.1.4
>            Reporter: Karthik Palanisamy
>            Assignee: Karthik Palanisamy
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is a kernel overhead on datanode upgrade. If datanode with millions of 
> blocks and 10+ disks then block-layout migration will be super expensive 
> during its hardlink operation.  Slowness is observed when running with large 
> hardlink threads(dfs.datanode.block.id.layout.upgrade.threads, default is 12 
> thread for each disk) and its runs for 2+ hours. 
> I.e 10*12=120 threads (for 10 disks)
> Small test:
> RHEL7, 32 cores, 20 GB RAM, 8 GB DN heap
> ||dfs.datanode.block.id.layout.upgrade.threads||Blocks||Disks||Time taken||
> |12|3.3 Million|1|2 minutes and 59 seconds|
> |6|3.3 Million|1|2 minutes and 35 seconds|
> |3|3.3 Million|1|2 minutes and 51 seconds|
> Tried same test twice and 95% is accurate (only a few sec difference on each 
> iteration). Using 6 thread is faster than 12 thread because of its overhead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to