[ 
https://issues.apache.org/jira/browse/HDFS-10716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-10716:
-----------------------------
    Description: 
In HDFS-10602, we found a failing case that the balancer moves data always 
between 2 DNs. And it made the balancer can't be finished. I debug the code for 
this, I found there seems a bug in choosing pending blocks in 
{{Dispatcher.Source.chooseNextMove}}.

The codes:
{code}
    private PendingMove chooseNextMove() {
      for (Iterator<Task> i = tasks.iterator(); i.hasNext();) {
        final Task task = i.next();
        final DDatanode target = task.target.getDDatanode();
        final PendingMove pendingBlock = new PendingMove(this, task.target);
        if (target.addPendingBlock(pendingBlock)) {
          // target is not busy, so do a tentative block allocation
          if (pendingBlock.chooseBlockAndProxy()) {
            long blockSize = pendingBlock.reportedBlock.getNumBytes(this);
            incScheduledSize(-blockSize);
            task.size -= blockSize;
            // If the size of bytes that need to be moved was first reduced to 
less than 0
            // it should also be removed.
            if (task.size == 0) {
              i.remove();
            }
            return pendingBlock;
            //...
{code}
The value of task.size was assigned in {{Balancer#matchSourceWithTargetToMove}}
{code}
    long size = Math.min(source.availableSizeToMove(), 
target.availableSizeToMove());
    final Task task = new Task(target, size);
{code}

This value was depended on the source and target node, and this value will not 
always can be reduced to 0 in choosing pending blocks. And then, it will still 
move the data to the target node even if the size of bytes that needed to move 
has been already reduced less than 0. And finally it will make the data 
imbalance again in cluster, then it leads the next balancer.

We can opitimize for this as this title mentioned, I think this can speed the 
balancer.

Can see the logs for failling case, or see the HDFS-10602.(Concentrating on the 
change record for the scheduled size of target node. That's my added info for 
debug, like this).
{code}
2016-08-01 16:51:57,492 [pool-51-thread-1] INFO  balancer.Dispatcher 
(Dispatcher.java:chooseNextMove(799)) - TargetNode: 58794, bytes scheduled to 
move, after: -67, before: 33
{code}

  was:
In HDFS-10602, we found a failing case that the balancer moves data always 
between 2 DNs. And it made the balancer can't be finished. I debug the code for 
this, I found there seems a bug in choosing pending blocks in 
{{Dispatcher.Source.chooseNextMove}}.

The codes:
{code}
    private PendingMove chooseNextMove() {
      for (Iterator<Task> i = tasks.iterator(); i.hasNext();) {
        final Task task = i.next();
        final DDatanode target = task.target.getDDatanode();
        final PendingMove pendingBlock = new PendingMove(this, task.target);
        if (target.addPendingBlock(pendingBlock)) {
          // target is not busy, so do a tentative block allocation
          if (pendingBlock.chooseBlockAndProxy()) {
            long blockSize = pendingBlock.reportedBlock.getNumBytes(this);
            incScheduledSize(-blockSize);
            task.size -= blockSize;
            // If the size of bytes that need to be moved was first reduced to 
less than 0
            // it should also be removed.
            if (task.size == 0) {
              i.remove();
            }
            return pendingBlock;
            //...
{code}
The value of task.size was assigned in {{Balancer#matchSourceWithTargetToMove}}
{code}
    long size = Math.min(source.availableSizeToMove(), 
target.availableSizeToMove());
    final Task task = new Task(target, size);
{code}

This value was depended on the source and target node, and this value will not 
always can be reduced to 0 in choosing pending blocks. And then, it will still 
move the data to the target node even if the size of bytes that needed to move 
has been already reduced less than 0. And finally it will make the data 
imbalance again in cluster.

We can opitimize for this as this title mentioned, I think this can speed the 
balancer.

Can see the logs for failling case, or see the HDFS-10602.(Concentrating on the 
change record for the scheduled size of target node. That's my added info for 
debug, like this).
{code}
2016-08-01 16:51:57,492 [pool-51-thread-1] INFO  balancer.Dispatcher 
(Dispatcher.java:chooseNextMove(799)) - TargetNode: 58794, bytes scheduled to 
move, after: -67, before: 33
{code}


> The target node should be removed in balancer when the size of bytes that 
> need to move firstly reduced to less than 0
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10716
>                 URL: https://issues.apache.org/jira/browse/HDFS-10716
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer & mover
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>         Attachments: HDFS-10716.001.patch, failing.log
>
>
> In HDFS-10602, we found a failing case that the balancer moves data always 
> between 2 DNs. And it made the balancer can't be finished. I debug the code 
> for this, I found there seems a bug in choosing pending blocks in 
> {{Dispatcher.Source.chooseNextMove}}.
> The codes:
> {code}
>     private PendingMove chooseNextMove() {
>       for (Iterator<Task> i = tasks.iterator(); i.hasNext();) {
>         final Task task = i.next();
>         final DDatanode target = task.target.getDDatanode();
>         final PendingMove pendingBlock = new PendingMove(this, task.target);
>         if (target.addPendingBlock(pendingBlock)) {
>           // target is not busy, so do a tentative block allocation
>           if (pendingBlock.chooseBlockAndProxy()) {
>             long blockSize = pendingBlock.reportedBlock.getNumBytes(this);
>             incScheduledSize(-blockSize);
>             task.size -= blockSize;
>             // If the size of bytes that need to be moved was first reduced 
> to less than 0
>             // it should also be removed.
>             if (task.size == 0) {
>               i.remove();
>             }
>             return pendingBlock;
>             //...
> {code}
> The value of task.size was assigned in 
> {{Balancer#matchSourceWithTargetToMove}}
> {code}
>     long size = Math.min(source.availableSizeToMove(), 
> target.availableSizeToMove());
>     final Task task = new Task(target, size);
> {code}
> This value was depended on the source and target node, and this value will 
> not always can be reduced to 0 in choosing pending blocks. And then, it will 
> still move the data to the target node even if the size of bytes that needed 
> to move has been already reduced less than 0. And finally it will make the 
> data imbalance again in cluster, then it leads the next balancer.
> We can opitimize for this as this title mentioned, I think this can speed the 
> balancer.
> Can see the logs for failling case, or see the HDFS-10602.(Concentrating on 
> the change record for the scheduled size of target node. That's my added info 
> for debug, like this).
> {code}
> 2016-08-01 16:51:57,492 [pool-51-thread-1] INFO  balancer.Dispatcher 
> (Dispatcher.java:chooseNextMove(799)) - TargetNode: 58794, bytes scheduled to 
> move, after: -67, before: 33
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to