[ 
https://issues.apache.org/jira/browse/HDFS-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999105#comment-15999105
 ] 

Kihwal Lee commented on HDFS-11742:
-----------------------------------

bq. I probably still not.
I invite everyone to run Balancer post-HDFS-8818 on their moderately sized 
clusters with the existing or default settings.  This includes you, 
[~szetszwo].  I am not talking about quick expansion type of balancing that you 
seem to focus on, but a steady-state balancing. 

bq. BTW, the replaceblockoperationspersec metrics you shown earlier. Is it just 
for one datanode? Have you checked the other datanodes?
This is a aggregate of all nodes. 

bq.  I am not sure it is the right approach since the datanode pairs are sorted 
by priorities according to the utilization and data locality. The patch tries 
to schedule the same number of threads to all pairs.

The thread pool creation is per target, not per pair in HDFS-8818 and it tries 
to assign the same fixed number of threads to each thread pool. Is it not?  
This does not change in my patch. I am simply adjusting the size of thread pool 
to not exceed the limit, thus avoiding the skipping problem. Once there are 
skippings, the throughput can go down. 

> Improve balancer usability after HDFS-8188
> ------------------------------------------
>
>                 Key: HDFS-11742
>                 URL: https://issues.apache.org/jira/browse/HDFS-11742
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Blocker
>              Labels: release-blocker
>         Attachments: balancer2.8.png, HDFS-11742.branch-2.8.patch, 
> HDFS-11742.branch-2.patch, HDFS-11742.trunk.patch, HDFS-11742.v2.trunk.patch
>
>
> We ran 2.8 balancer with HDFS-8818 on a 280-node and a 2,400-node cluster. In 
> both cases, it would hang forever after two iterations. The two iterations 
> were also moving things at a significantly lower rate. The hang itself is 
> fixed by HDFS-11377, but the design limitation remains, so the balancer 
> throughput ends up actually lower.
> Instead of reverting HDFS-8188 as originally suggested, I am making a small 
> change to make it less error prone and more usable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to