[jira] [Commented] (HDFS-11742) Improve balancer usability after HDFS-8818

Kihwal Lee (JIRA) Mon, 05 Jun 2017 08:05:30 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037072#comment-16037072
 ]


Kihwal Lee commented on HDFS-11742:
-----------------------------------

!https://issues.apache.org/jira/secure/attachment/12871245/balancer_fix.png!

[~shv], here is a graph. 

HDFS-8188 + HDFS-11377 potentially move blocks faster, if the number of mover 
thread is jacked up very high. "High" is relative to the size of cluster and 
subject to the nature of imbalance. In one (~2500 node) of our clusters, 
setting it to 10,000 wasn't enough. The balancer does create 10,000 threads 
while only subset of them are utilized. Nicholas previously suggested 30,000 
and  while that would have "worked", it effectively means HDFS-8188 requires 
the mover threads limit to be removed.

What I did here is to honor the configured mover thread limit (default=1,000) 
and size a thread pool accordingly (#movers / #targets) instead of using a 
fixed number (default max=50).  I've verified it works as good as, and 
sometimes better than 2.7 balancer with the identical config.

> Improve balancer usability after HDFS-8818
> ------------------------------------------
>
>                 Key: HDFS-11742
>                 URL: https://issues.apache.org/jira/browse/HDFS-11742
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Blocker
>              Labels: release-blocker
>         Attachments: balancer2.8.png, balancer_fix.png, 
> HDFS-11742.branch-2.8.patch, HDFS-11742.branch-2.patch, 
> HDFS-11742.trunk.patch, HDFS-11742.v2.trunk.patch
>
>
> We ran 2.8 balancer with HDFS-8818 on a 280-node and a 2,400-node cluster. In 
> both cases, it would hang forever after two iterations. The two iterations 
> were also moving things at a significantly lower rate. The hang itself is 
> fixed by HDFS-11377, but the design limitation remains, so the balancer 
> throughput ends up actually lower.
> Instead of reverting HDFS-8188 as originally suggested, I am making a small 
> change to make it less error prone and more usable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-11742) Improve balancer usability after HDFS-8818

Reply via email to