Anu Engineer commented on HDFS-13728:

Let us do this change, however there are two major things.
 * This precondition is correct. If this is being violated, we have a latent 
bug in code.
 * I am fine with disabling the precondition, but there are 2 different ways in 
mind to do it.
 ** Convert Precondition to a LOG.warn or LOG.error
 ** Allow this operation to proceed on this Data node, if and only if *-force* 
flag is specified. This means that one of us is willing to hunt down this bug 
in the long run. For short term, that is not on our radar.

[~sodonnell] Thank you for root causing these issues and posting the 
suggestions, Since you have really done all the hard work, would you do the 
honors of posting a patch too ? – that is just add a LOG.error message to the 
suggested change.

> Disk Balaner should not fail if volume usage is greater than capacity
> ---------------------------------------------------------------------
>                 Key: HDFS-13728
>                 URL: https://issues.apache.org/jira/browse/HDFS-13728
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: diskbalancer
>    Affects Versions: 3.0.3
>            Reporter: Stephen O'Donnell
>            Priority: Minor
> We have seen a couple of scenarios where the disk balancer fails because a 
> datanode reports more spaced used on a disk than its capacity, which should 
> not be possible.
> This is due to the check below in DiskBalancerVolume.java:
> {code}
>   public void setUsed(long dfsUsedSpace) {
>     Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(),
>         "DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)",
>         dfsUsedSpace, getCapacity());
>     this.used = dfsUsedSpace;
>   }
> {code}
> While I agree that it should not be possible for a DN to report more usage on 
> a volume than its capacity, there seems to be some issue that causes this to 
> occur sometimes.
> In general, this full disk is what causes someone to want to run the Disk 
> Balancer, only to find it fails with the error.
> There appears to be nothing you can do to force the Disk Balancer to run at 
> this point, but in the scenarios I saw, some data was removed from the disk 
> and usage dropped below the capacity resolving the issue.
> Can we considered relaxing the above check, and if the usage is greater than 
> the capacity, just set the usage to the capacity so the calculations all work 
> ok?
> Eg something like this:
> {code}
>    public void setUsed(long dfsUsedSpace) {
> -    Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> -    this.used = dfsUsedSpace;
> +    if (dfsUsedSpace > this.getCapacity()) {
> +      this.used = this.getCapacity();
> +    } else {
> +      this.used = dfsUsedSpace;
> +    }
>    }
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to