[ 
https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128604#comment-14128604
 ] 

Yongjun Zhang commented on HDFS-6621:
-------------------------------------

Thank you so much [~szetszwo]! 

HI [~ravwojdyla] and [~bbowman410], wonder if any of you could provide a patch 
for the first problem here soon? If you won't get to it soon, hope you don't 
mind that I put one on your behalf. And we will continue discussing what's the 
right approach for problem 2 in a new jira.  Thanks a lot.








> Hadoop Balancer prematurely exits iterations
> --------------------------------------------
>
>                 Key: HDFS-6621
>                 URL: https://issues.apache.org/jira/browse/HDFS-6621
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer
>    Affects Versions: 2.2.0, 2.4.0
>         Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 
> 2.4.0
>            Reporter: Benjamin Bowman
>              Labels: balancer
>         Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3, 
> HDFS-6621.patch_4
>
>
> I have been having an issue with the balancing being too slow.  The issue was 
> not with the speed with which blocks were moved, but rather the balancer 
> would prematurely exit out of it's balancing iterations.  It would move ~10 
> blocks or 100 MB then exit the current iteration (in which it said it was 
> planning on moving about 10 GB). 
> I looked in the Balancer.java code and believe I found and solved the issue.  
> In the dispatchBlocks() function there is a variable, 
> "noPendingBlockIteration", which counts the number of iterations in which a 
> pending block to move cannot be found.  Once this number gets to 5, the 
> balancer exits the overall balancing iteration.  I believe the desired 
> functionality is 5 consecutive no pending block iterations - however this 
> variable is never reset to 0 upon block moves.  So once this number reaches 5 
> - even if there have been thousands of blocks moved in between these no 
> pending block iterations  - the overall balancing iteration will prematurely 
> end.  
> The fix I applied was to set noPendingBlockIteration = 0 when a pending block 
> is found and scheduled.  In this way, my iterations do not prematurely exit 
> unless there is 5 consecutive no pending block iterations.   Below is a copy 
> of my dispatchBlocks() function with the change I made.
> {code}
>     private void dispatchBlocks() {
>       long startTime = Time.now();
>       long scheduledSize = getScheduledSize();
>       this.blocksToReceive = 2*scheduledSize;
>       boolean isTimeUp = false;
>       int noPendingBlockIteration = 0;
>       while(!isTimeUp && getScheduledSize()>0 &&
>           (!srcBlockList.isEmpty() || blocksToReceive>0)) {
>         PendingBlockMove pendingBlock = chooseNextBlockToMove();
>         if (pendingBlock != null) {
>           noPendingBlockIteration = 0;
>           // move the block
>           pendingBlock.scheduleBlockMove();
>           continue;
>         }
>         /* Since we can not schedule any block to move,
>          * filter any moved blocks from the source block list and
>          * check if we should fetch more blocks from the namenode
>          */
>         filterMovedBlocks(); // filter already moved blocks
>         if (shouldFetchMoreBlocks()) {
>           // fetch new blocks
>           try {
>             blocksToReceive -= getBlockList();
>             continue;
>           } catch (IOException e) {
>             LOG.warn("Exception while getting block list", e);
>             return;
>           }
>         } else {
>           // source node cannot find a pendingBlockToMove, iteration +1
>           noPendingBlockIteration++;
>           // in case no blocks can be moved for source node's task,
>           // jump out of while-loop after 5 iterations.
>           if (noPendingBlockIteration >= MAX_NO_PENDING_BLOCK_ITERATIONS) {
>             setScheduledSize(0);
>           }
>         }
>         // check if time is up or not
>         if (Time.now()-startTime > MAX_ITERATION_TIME) {
>           isTimeUp = true;
>           continue;
>         }
>         /* Now we can not schedule any block to move and there are
>          * no new blocks added to the source block list, so we wait.
>          */
>         try {
>           synchronized(Balancer.this) {
>             Balancer.this.wait(1000);  // wait for targets/sources to be idle
>           }
>         } catch (InterruptedException ignored) {
>         }
>       }
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to