[ 
https://issues.apache.org/jira/browse/HDFS-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994001#comment-15994001
 ] 

Kihwal Lee commented on HDFS-11742:
-----------------------------------

!https://issues.apache.org/jira/secure/attachment/12866086/balancer2.8.png!

I ran Balancer with the suggested revert and then with HDFS-11377.  I won't 
bother to post plots for pre-HDFS-11377. It barely registers.  As you can see 
it is still not as good as revert.  Sure it might work well for certain cases, 
but clearly performs poorly on all the cluster we have tried.   If 2.8.1 is put 
up for vote with this, I will have to -1 the release.

bq. you may change it by setting dfs.datanode.balance.max.concurrent.moves.
It is not feasible to tune it per cluster. 

bq. What we need is HDFS-7639
I agree that dispatching needs to be asynchronous.  But, I don't see HDFS-8818 
as a stepping stone or prerequisite.  Since we are trying to release 2.8.1, I 
suggest HDFS-8818 be reverted and the improvement be redesigned.

> Revert the core changes from HDFS-8818
> --------------------------------------
>
>                 Key: HDFS-11742
>                 URL: https://issues.apache.org/jira/browse/HDFS-11742
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Blocker
>         Attachments: balancer2.8.png, HDFS-11742.branch-2.8.patch, 
> HDFS-11742.branch-2.patch, HDFS-11742.trunk.patch
>
>
> This is to revert the core changes made by HDFS-8818. The reason is explained 
> in the jira comments.  HDFS-8818 put in config and logging changes that are 
> tied to the core change. I will leave them as is.
> We ran 2.8 balancer with HDFS-8818 on a 280-node and a 2,400-node cluster. In 
> both cases, it would hang forever after two iterations. The two iterations 
> were also moving things at a significantly lower rate. The hang itself is 
> fixed by HDFS-11377, but the design limitation remains, so the balancer 
> throughput ends up actually lower.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to