[jira] [Commented] (HDFS-15440) The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.

Hadoop QA (Jira) Fri, 24 Jul 2020 19:52:20 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164723#comment-17164723
 ]


Hadoop QA commented on HDFS-15440:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
26s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15440 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008387/HDFS-15440.000.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29561/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.  
> --------------------------------------------------------------------------
>
>                 Key: HDFS-15440
>                 URL: https://issues.apache.org/jira/browse/HDFS-15440
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer &amp; mover
>            Reporter: AMC-team
>            Priority: Major
>         Attachments: HDFS-15440.000.patch
>
>
> In HDFS disk balancer, configuration parameter 
> "dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10 
> means 10%) which defines a good enough move.
> The description in hdfs-default.xml is not so clear to me how the value 
> actually calculates and works
> {quote}When a disk balancer copy operation is proceeding, the datanode is 
> still active. So it might not be possible to move the exactly specified 
> amount of data. So tolerance allows us to define a percentage which defines a 
> good enough move.
> {quote}
> So I refer to the [official doc of HDFS disk 
> balancer|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html]
>  and the description is:
> {quote}The tolerance percent specifies when we have reached a good enough 
> value for any copy step. For example, if you specify 10 then getting close to 
> 10% of the target value is good enough. It is to say if the move operation is 
> 20GB in size, if we can move 18GB (20 * (1-10%)) that operation is considered 
> successful.
> {quote}
> However from the source code in DiskBalancer.java
> {code:java}
> // Inflates bytesCopied and returns true or false. This allows us to stop
> // copying if we have reached close enough.
> private boolean isCloseEnough(DiskBalancerWorkItem item) {
>   long temp = item.getBytesCopied() +
>      ((item.getBytesCopied() * getBlockTolerancePercentage(item)) / 100);
>   return (item.getBytesToCopy() >= temp) ? false : true;
> }
> {code}
> Here, if item.getBytesToCopy() = 20GB, then item.getBytesCopied() = 18GB is 
> still not enough because 20 > 18 + 18*0.1
> Here, we should check whether 18 > 20*(1-0.1).
>  The calculation in isLessThanNeeded() (Checks if a given block is less than 
> needed size to meet our goal.) is also not intuitive in the same way.
> Also, this parameter doesn't have upper bound check, which means you can even 
> set it to 1000000% which is obviously wrong value.
> *How to fix*
> Although this may not lead severe failure, it is better to make it consistent 
> between doc and code, and also better to refine the description in 
> hdfs-default.xml to make it more precise and clear.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-15440) The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.

Reply via email to