[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
AMC-team updated HDFS-15438: ---------------------------- Description: In HDFS disk balancer, the config parameter "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number of errors we can ignore for a specific move between two disks before it is abandoned. The parameter can accept value that >= 0. And setting the value to 0 should mean no error tolerance. However, setting the value to 0 will simple don't do the block copy because the while loop condition *item.getErrorCount() < getMaxError(item)* will not satisfied. {code:java} // Gets the next block that we can copy private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, DiskBalancerWorkItem item) { while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { //getMaxError = 0 try { ... //get the block } catch (IOException e) { item.incErrorCount(); } if (item.getErrorCount() >= getMaxError(item)) { item.setErrMsg("Error count exceeded."); LOG.info("Maximum error count exceeded. Error count: {} Max error:{} ", item.getErrorCount(), item.getMaxDiskErrors()); } {code} *How to fix* Change the while loop condition to support value 0. was: In HDFS disk balancer, the config parameter "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number of errors we can ignore for a specific move between two disks before it is abandoned. The parameter can accept value that >= 0. And setting the value to 0 should mean no error tolerance. However, setting the value to 0 will simple don't do the block copy because the while loop condition *item.getErrorCount() < getMaxError(item)* will not satisfied. {code:java} // Gets the next block that we can copy private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, DiskBalancerWorkItem item) { while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { //getMaxError = 0 try { ... //get the block } catch (IOException e) { item.incErrorCount(); } if (item.getErrorCount() >= getMaxError(item)) { item.setErrMsg("Error count exceeded."); LOG.info("Maximum error count exceeded. Error count: {} Max error:{} ", item.getErrorCount(), item.getMaxDiskErrors()); } {code} *How to fix* Change the while loop condition and the following if statement condition to support value 0. > dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -------------------------------------------------------------- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Reporter: AMC-team > Priority: Major > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simple don't do > the block copy because the while loop condition *item.getErrorCount() < > getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > //getMaxError = 0 > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } > if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org