[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AMC-team updated HDFS-15438:
----------------------------
    Description: 
In HDFS disk balancer, the config parameter "dfs.disk.balancer.max.disk.errors" 
is to control the value of maximum number of errors we can ignore for a 
specific move between two disks before it is abandoned.

The parameter can accept value that >= 0. And setting the value to 0 should 
mean no error tolerance. However, setting the value to 0 will simple don't do 
the block copy because the while loop condition *item.getErrorCount() < 
getMaxError(item)* will not satisfied.
{code:java}
// Gets the next block that we can copy
private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
                                         DiskBalancerWorkItem item) {
      while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { 
//getMaxError = 0
        try {
          ... //get the block
        }  catch (IOException e) {
            item.incErrorCount();
        }
       if (item.getErrorCount() >= getMaxError(item)) {
        item.setErrMsg("Error count exceeded.");
        LOG.info("Maximum error count exceeded. Error count: {} Max error:{} ",
            item.getErrorCount(), item.getMaxDiskErrors());
      }
{code}
*How to fix*

Change the while loop condition to support value 0.
  

  was:
In HDFS disk balancer, the config parameter "dfs.disk.balancer.max.disk.errors" 
is to control the value of maximum number of errors we can ignore for a 
specific move between two disks before it is abandoned.

The parameter can accept value that >= 0. And setting the value to 0 should 
mean no error tolerance. However, setting the value to 0 will simple don't do 
the block copy because the while loop condition *item.getErrorCount() < 
getMaxError(item)* will not satisfied.
{code:java}
// Gets the next block that we can copy
private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
                                         DiskBalancerWorkItem item) {
      while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { 
//getMaxError = 0
        try {
          ... //get the block
        }  catch (IOException e) {
            item.incErrorCount();
        }
       if (item.getErrorCount() >= getMaxError(item)) {
        item.setErrMsg("Error count exceeded.");
        LOG.info("Maximum error count exceeded. Error count: {} Max error:{} ",
            item.getErrorCount(), item.getMaxDiskErrors());
      }
{code}
*How to fix*

Change the while loop condition and the following if statement condition to 
support value 0.
  


> dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --------------------------------------------------------------
>
>                 Key: HDFS-15438
>                 URL: https://issues.apache.org/jira/browse/HDFS-15438
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer &amp; mover
>            Reporter: AMC-team
>            Priority: Major
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simple don't do 
> the block copy because the while loop condition *item.getErrorCount() < 
> getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>                                          DiskBalancerWorkItem item) {
>       while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { 
> //getMaxError = 0
>         try {
>           ... //get the block
>         }  catch (IOException e) {
>             item.incErrorCount();
>         }
>        if (item.getErrorCount() >= getMaxError(item)) {
>         item.setErrMsg("Error count exceeded.");
>         LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
>             item.getErrorCount(), item.getMaxDiskErrors());
>       }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to