[jira] [Updated] (HDFS-15440) The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.

AMC-team (Jira) Fri, 24 Jul 2020 19:43:28 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


AMC-team updated HDFS-15440:
----------------------------
    Attachment: HDFS-15440.000.patch
        Status: Patch Available  (was: Open)

> The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.  
> --------------------------------------------------------------------------
>
>                 Key: HDFS-15440
>                 URL: https://issues.apache.org/jira/browse/HDFS-15440
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer &amp; mover
>            Reporter: AMC-team
>            Priority: Major
>         Attachments: HDFS-15440.000.patch
>
>
> In HDFS disk balancer, configuration parameter 
> "dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10 
> means 10%) which defines a good enough move.
> The description in hdfs-default.xml is not so clear to me how the value 
> actually calculates and works
> {quote}When a disk balancer copy operation is proceeding, the datanode is 
> still active. So it might not be possible to move the exactly specified 
> amount of data. So tolerance allows us to define a percentage which defines a 
> good enough move.
> {quote}
> So I refer to the [official doc of HDFS disk 
> balancer|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html]
>  and the description is:
> {quote}The tolerance percent specifies when we have reached a good enough 
> value for any copy step. For example, if you specify 10 then getting close to 
> 10% of the target value is good enough. It is to say if the move operation is 
> 20GB in size, if we can move 18GB (20 * (1-10%)) that operation is considered 
> successful.
> {quote}
> However from the source code in DiskBalancer.java
> {code:java}
> // Inflates bytesCopied and returns true or false. This allows us to stop
> // copying if we have reached close enough.
> private boolean isCloseEnough(DiskBalancerWorkItem item) {
>   long temp = item.getBytesCopied() +
>      ((item.getBytesCopied() * getBlockTolerancePercentage(item)) / 100);
>   return (item.getBytesToCopy() >= temp) ? false : true;
> }
> {code}
> Here, if item.getBytesToCopy() = 20GB, then item.getBytesCopied() = 18GB is 
> still not enough because 20 > 18 + 18*0.1
> Here, we should check whether 18 > 20*(1-0.1).
>  The calculation in isLessThanNeeded() (Checks if a given block is less than 
> needed size to meet our goal.) is also not intuitive in the same way.
> Also, this parameter doesn't have upper bound check, which means you can even 
> set it to 1000000% which is obviously wrong value.
> *How to fix*
> Although this may not lead severe failure, it is better to make it consistent 
> between doc and code, and also better to refine the description in 
> hdfs-default.xml to make it more precise and clear.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-15440) The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.

Reply via email to