[ 
https://issues.apache.org/jira/browse/HBASE-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416327#comment-15416327
 ] 

Guanghao Zhang commented on HBASE-16393:
----------------------------------------

+1 on this idea. We found this in our production cluster, too. The balancer is 
too slow when there are a lot of regions. And some default balancer configs is 
too small for big cluster. Maybe we can make the default config value related 
to regions number.

> Improve computeHDFSBlocksDistribution
> -------------------------------------
>
>                 Key: HBASE-16393
>                 URL: https://issues.apache.org/jira/browse/HBASE-16393
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: binlijin
>         Attachments: HBASE-16393.patch
>
>
> With our cluster is big, i can see the balancer is slow from time to time. 
> And the balancer will be called on master startup, so we can see the startup 
> is slow also. 
> The first thing i think whether if we can parallel compute different region's 
> HDFSBlocksDistribution. 
> The second i think we can improve compute single region's 
> HDFSBlocksDistribution.
> When to compute a storefile's HDFSBlocksDistribution first we call 
> FileSystem#getFileStatus(path) and then 
> FileSystem#getFileBlockLocations(status, start, length), so two namenode rpc 
> call for every storefile. Instead we can use FileSystem#listLocatedStatus to 
> get a LocatedFileStatus for the information we need, so reduce the namenode 
> rpc call to one. This can speed the computeHDFSBlocksDistribution, but also 
> send out less rpc call to namenode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to