[
https://issues.apache.org/jira/browse/HDFS-15440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
AMC-team updated HDFS-15440:
----------------------------
Attachment: HDFS-15440.000.patch
Status: Patch Available (was: Open)
> The usage of dfs.disk.balancer.block.tolerance.percent is not intuitive.
> --------------------------------------------------------------------------
>
> Key: HDFS-15440
> URL: https://issues.apache.org/jira/browse/HDFS-15440
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: balancer & mover
> Reporter: AMC-team
> Priority: Major
> Attachments: HDFS-15440.000.patch
>
>
> In HDFS disk balancer, configuration parameter
> "dfs.disk.balancer.block.tolerance.percent" is to set a percentage (e.g. 10
> means 10%) which defines a good enough move.
> The description in hdfs-default.xml is not so clear to me how the value
> actually calculates and works
> {quote}When a disk balancer copy operation is proceeding, the datanode is
> still active. So it might not be possible to move the exactly specified
> amount of data. So tolerance allows us to define a percentage which defines a
> good enough move.
> {quote}
> So I refer to the [official doc of HDFS disk
> balancer|https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html]
> and the description is:
> {quote}The tolerance percent specifies when we have reached a good enough
> value for any copy step. For example, if you specify 10 then getting close to
> 10% of the target value is good enough. It is to say if the move operation is
> 20GB in size, if we can move 18GB (20 * (1-10%)) that operation is considered
> successful.
> {quote}
> However from the source code in DiskBalancer.java
> {code:java}
> // Inflates bytesCopied and returns true or false. This allows us to stop
> // copying if we have reached close enough.
> private boolean isCloseEnough(DiskBalancerWorkItem item) {
> long temp = item.getBytesCopied() +
> ((item.getBytesCopied() * getBlockTolerancePercentage(item)) / 100);
> return (item.getBytesToCopy() >= temp) ? false : true;
> }
> {code}
> Here, if item.getBytesToCopy() = 20GB, then item.getBytesCopied() = 18GB is
> still not enough because 20 > 18 + 18*0.1
> Here, we should check whether 18 > 20*(1-0.1).
> The calculation in isLessThanNeeded() (Checks if a given block is less than
> needed size to meet our goal.) is also not intuitive in the same way.
> Also, this parameter doesn't have upper bound check, which means you can even
> set it to 1000000% which is obviously wrong value.
> *How to fix*
> Although this may not lead severe failure, it is better to make it consistent
> between doc and code, and also better to refine the description in
> hdfs-default.xml to make it more precise and clear.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]