[ 
https://issues.apache.org/jira/browse/HDFS-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14591260#comment-14591260
 ] 

Haohui Mai commented on HDFS-8617:
----------------------------------

When it comes to throttling it always comes down to the questions: what are the 
rationales of choosing these magic numbers?

Do these numbers based on the speed of the disks, the loads of clusters, or 
just something handy? Do these number fit various configurations in production 
deployment? 

Note that making it configurable does not address the problem. Picking the 
right magic numbers is already difficult, adjusting the numbers w.r.t. the 
loads of cluster is even more difficult, if not impossible. That's the exact 
reason why proposals like HDFS-7265 has been superseded by mechanisms like 
HDFS-7270 that are automatically tuned in the runtime.

For this particular use case I think it makes more sense to let OS to do the 
job. To avoid {{checkDirs()}} related calls competing with normal DN requests, 
putting them into a thread that has lower I/O priorities should be sufficient. 
You can rely on the I/O queue of the OS for throttling and performance tuning.


> Throttle DiskChecker#checkDirs() speed.
> ---------------------------------------
>
>                 Key: HDFS-8617
>                 URL: https://issues.apache.org/jira/browse/HDFS-8617
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: HDFS
>    Affects Versions: 2.7.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>         Attachments: HDFS-8617.000.patch
>
>
> As described in HDFS-8564,  {{DiskChecker.checkDirs(finalizedDir)}} is 
> causing excessive I/Os because {{finalizedDirs}} might have up to 64K 
> sub-directories (HDFS-6482).
> This patch proposes to limit the rate of IO operations in 
> {{DiskChecker.checkDirs()}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to