[ 
https://issues.apache.org/jira/browse/HDFS-11015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15595313#comment-15595313
 ] 

Kihwal Lee commented on HDFS-11015:
-----------------------------------

Heh. Looks like the unit tests are doing their job.

> Enforce timeout in balancer
> ---------------------------
>
>                 Key: HDFS-11015
>                 URL: https://issues.apache.org/jira/browse/HDFS-11015
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>         Attachments: HDFS-11015-1.patch, HDFS-11015-2.patch, balancer.png
>
>
> 1) Hung node detection: HDFS-6247 has removed the socket read timeout while 
> adding the periodic response for slow block moves. However, the removal of 
> the long timeout wasn't necessary.  The timeout is still useful for avoiding 
> hung nodes and does not abort slow moves.
> 2) Enforcing the iteration limit:The 20 minute iteration limit is supposed to 
> be enforced, but it is not. An iteration can easily stretch to 30 to 40 
> minutes with a long tail. Because of the long tails, the balancer throughput 
> does not reach its full potential.
> 3) Slow move detection: For improved throughput, imposing block move timeout 
> is sometimes necessary.  We have seen an iteration taking over 2 hours 
> because of one slow block move.  This is mainly for catching exceptionally 
> slow moves.  Even if the balancer stops waiting, the move will continue and 
> finish.
> In order to not undo what  HDFS-6247 tried to achieve, it should be possible 
> to configure off 3).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to