[jira] Commented: (HDFS-1105) Balancer improvement

Hairong Kuang (JIRA) Thu, 29 Apr 2010 16:16:23 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862470#action_12862470
 ]


Hairong Kuang commented on HDFS-1105:
-------------------------------------

Thank Dmytro for uploading a new patch. I really like the changes you made! 
Here are more review comments:
# The major contribution of the patch is that it enforces the max time for each 
iteration including the waiting time for moves to complete. I prefer the 
structure of disptchBlockMove to be
{code} {
  long startTime = Util.now();
  start threads to schedule & dispatch block moves; pass startTime to each 
thread as you do in your patch;
  waitForMoveCompletion(startTime); // pass startTime as well; return when 
reaches the max iteration time
}{code}
In this way, you do not need to introduce new heuristic for 
waitForMoveCompletion to quit as you do in your patch.
# I prefer PendingBlockMove#closeSocket() to call sock.close() instead of 
closing only its input stream. I understand that the final section of 
receiveResponse() closes the socket. However it is nice to release all its 
resources in one shot even in PendingBlockMove#closeSocket(). ReceiveRespnse() 
should catch EOFException before catching IOException to avoid printing two log 
messages for one exception. The log message for EOFException should simply say 
EOFException because sometimes it may not caused by 
PendingBlockMove#closeSocket().

Other minor comments:
# should remove unused imports;
# MAX_NUM_CONCURRENT_MOVE should not drop modifier "final";
# should keep all option parsing & balancer initialization in one method "init";
# Replace timeToStr with your new time format and calls timeToStr(timeLeft) in 
Balancer#run();
# It is not user friendly to print exception stack trace on the screen.

> Balancer improvement
> --------------------
>
>                 Key: HDFS-1105
>                 URL: https://issues.apache.org/jira/browse/HDFS-1105
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Dmytro Molkov
>            Assignee: Dmytro Molkov
>         Attachments: HDFS-1105.2.patch, HDFS-1105.3.patch, HDFS-1105.patch
>
>
> We were seeing some weird issues with the balancer in our cluster:
> 1) it can get stuck during an iteration and only restarting it helps
> 2) the iterations are highly inefficient. With 20 minutes iteration it moves 
> 7K blocks a minute for the first 6 minutes and hundreds of blocks in the next 
> 14 minutes
> 3) it can hit namenode and the network pretty hard
> A few improvements we came up with as a result:
> Making balancer more deterministic in terms of running time of iteration, 
> improving the efficiency and making the load configurable:
> Make many of the constants configurable command line parameters: Iteration 
> length, number of blocks to move in parallel to a given node and in cluster 
> overall.
> Terminate transfers that are still in progress after iteration is over.
> Previously iteration time was the time window in which the balancer was 
> scheduling the moves and then it would wait for the moves to finish 
> indefinitely. Each scheduling task can run up to iteration time or even 
> longer. This means if you have too many of them and they are long your actual 
> iterations are longer than 20 minutes. Now each scheduling task has a time of 
> the start of iteration and it should schedule the moves only if it did not 
> run out of time. So the tasks that have started after the iteration is over 
> will not schedule any moves.
> The number of move threads and dispatch threads is configurable so that 
> depending on the load of the cluster you can run it slower.
> I will attach a patch, please let me know what you think and what can be done 
> better.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1105) Balancer improvement

Reply via email to