[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161369#comment-17161369
 ] 

Ayush Saxena commented on HDFS-15438:
-------------------------------------

Thanx [~AMC-team]  for the report.

even if you get away here, you may get stuck below at L926 :
{code:java}
      if (item.getErrorCount() >= getMaxError(item)) {
{code}
error count and max error both shall be zero and this condition shall become 
true and ultimately you would land up setting an error.

Instead of this :

{code:java}
+          (item.getErrorCount() == 0 || item.getErrorCount() < 
getMaxError(item))) {
{code}
Shouldn't we just have :
{code:java}
          item.getErrorCount() <= getMaxError(item) {
{code}

and even tweak the if condition at L926 to remove the '=' sign?

cc [~aengineer] you wrote this up, any pointers.

> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> ----------------------------------------------------------------------
>
>                 Key: HDFS-15438
>                 URL: https://issues.apache.org/jira/browse/HDFS-15438
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer &amp; mover
>            Reporter: AMC-team
>            Priority: Major
>         Attachments: HDFS-15438.000.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>                                          DiskBalancerWorkItem item) {
>       while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
>         try {
>           ... //get the block
>         }  catch (IOException e) {
>             item.incErrorCount();
>         }
>        if (item.getErrorCount() >= getMaxError(item)) {
>         item.setErrMsg("Error count exceeded.");
>         LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
>             item.getErrorCount(), item.getMaxDiskErrors());
>       }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to