[
https://issues.apache.org/jira/browse/HDFS-11015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kihwal Lee updated HDFS-11015:
------------------------------
Attachment: HDFS-11015-2.patch
Attaching the updated patch.
> Enforce timeout in balancer
> ---------------------------
>
> Key: HDFS-11015
> URL: https://issues.apache.org/jira/browse/HDFS-11015
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Attachments: HDFS-11015-1.patch, HDFS-11015-2.patch, balancer.png
>
>
> 1) Hung node detection: HDFS-6247 has removed the socket read timeout while
> adding the periodic response for slow block moves. However, the removal of
> the long timeout wasn't necessary. The timeout is still useful for avoiding
> hung nodes and does not abort slow moves.
> 2) Enforcing the iteration limit:The 20 minute iteration limit is supposed to
> be enforced, but it is not. An iteration can easily stretch to 30 to 40
> minutes with a long tail. Because of the long tails, the balancer throughput
> does not reach its full potential.
> 3) Slow move detection: For improved throughput, imposing block move timeout
> is sometimes necessary. We have seen an iteration taking over 2 hours
> because of one slow block move. This is mainly for catching exceptionally
> slow moves. Even if the balancer stops waiting, the move will continue and
> finish.
> In order to not undo what HDFS-6247 tried to achieve, it should be possible
> to configure off 3).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]