[
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164713#comment-17164713
]
AMC-team commented on HDFS-15438:
---------------------------------
Thanks [~ayushtkn] for the feedback. I upload a patch to change the while loop
condition and if condition to support value 0.
What's more, IMHO, the current code logic may be more intuitive. Previously if
we set dfs.disk.balancer.max.disk.errors to n, it can actually just tolerate
n-1 errors. Now it can tolerate n errors, which is more consistent with the
parameter's documentation:
{quote}During a block move from a source to destination disk, we might
encounter various errors. *This defines how many errors we can tolerate* before
we declare a move between 2 disks (or a step) has failed.
{quote}
> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> ----------------------------------------------------------------------
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: balancer & mover
> Reporter: AMC-team
> Priority: Major
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch
>
>
> In HDFS disk balancer, the config parameter
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number
> of errors we can ignore for a specific move between two disks before it is
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should
> mean no error tolerance. However, setting the value to 0 will simply don't do
> the block copy even there is no disk error occur because the while loop
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
> DiskBalancerWorkItem item) {
> while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
> ... //get the block
> } catch (IOException e) {
> item.incErrorCount();
> }
> if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{}
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
> }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]