[
https://issues.apache.org/jira/browse/HDFS-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164723#comment-17164723
]
Hadoop QA commented on HDFS-15440:
----------------------------------
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m
26s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094.
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15440 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/13008387/HDFS-15440.000.patch |
| Console output |
https://builds.apache.org/job/PreCommit-HDFS-Build/29561/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
This message was automatically generated.
> The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.
> --------------------------------------------------------------------------
>
> Key: HDFS-15440
> URL: https://issues.apache.org/jira/browse/HDFS-15440
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: balancer & mover
> Reporter: AMC-team
> Priority: Major
> Attachments: HDFS-15440.000.patch
>
>
> In HDFS disk balancer, configuration parameter
> "dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10
> means 10%) which defines a good enough move.
> The description in hdfs-default.xml is not so clear to me how the value
> actually calculates and works
> {quote}When a disk balancer copy operation is proceeding, the datanode is
> still active. So it might not be possible to move the exactly specified
> amount of data. So tolerance allows us to define a percentage which defines a
> good enough move.
> {quote}
> So I refer to the [official doc of HDFS disk
> balancer|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html]
> and the description is:
> {quote}The tolerance percent specifies when we have reached a good enough
> value for any copy step. For example, if you specify 10 then getting close to
> 10% of the target value is good enough. It is to say if the move operation is
> 20GB in size, if we can move 18GB (20 * (1-10%)) that operation is considered
> successful.
> {quote}
> However from the source code in DiskBalancer.java
> {code:java}
> // Inflates bytesCopied and returns true or false. This allows us to stop
> // copying if we have reached close enough.
> private boolean isCloseEnough(DiskBalancerWorkItem item) {
> long temp = item.getBytesCopied() +
> ((item.getBytesCopied() * getBlockTolerancePercentage(item)) / 100);
> return (item.getBytesToCopy() >= temp) ? false : true;
> }
> {code}
> Here, if item.getBytesToCopy() = 20GB, then item.getBytesCopied() = 18GB is
> still not enough because 20 > 18 + 18*0.1
> Here, we should check whether 18 > 20*(1-0.1).
> The calculation in isLessThanNeeded() (Checks if a given block is less than
> needed size to meet our goal.) is also not intuitive in the same way.
> Also, this parameter doesn't have upper bound check, which means you can even
> set it to 1000000% which is obviously wrong value.
> *How to fix*
> Although this may not lead severe failure, it is better to make it consistent
> between doc and code, and also better to refine the description in
> hdfs-default.xml to make it more precise and clear.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]