[jira] [Updated] (HBASE-16393) Improve computeHDFSBlocksDistribution

binlijin (JIRA) Wed, 10 Aug 2016 17:59:11 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


binlijin updated HBASE-16393:
-----------------------------
    Description: 
With our cluster is big, i can see the balancer is slow from time to time. And 
the balancer will be called on master startup, so we can see the startup is 
slow also. 
The first thing i think whether if we can parallel compute different region's 
HDFSBlocksDistribution. 
The second i think we can improve compute single region's 
HDFSBlocksDistribution.
When to compute a storefile's HDFSBlocksDistribution first we call 
FileSystem#getFileStatus(path) and then 
FileSystem#getFileBlockLocations(status, start, length), so two namenode rpc 
call for every storefile. Instead we can use FileSystem#listLocatedStatus to 
get a LocatedFileStatus for the information we need, so reduce the namenode rpc 
call to one. This can speed the computeHDFSBlocksDistribution, but also send 
out less rpc call to namenode.

  was:
With our cluster is big, i can see the balancer is slow from time to time. And 
the balancer will be called on master startup, so we can see the startup is 
slow also. 
The first thing i think whether if we can parallel compute different region's 
HDFSBlocksDistribution. 
The second i think we can improve compute single region's 
HDFSBlocksDistribution.
When to compute a storefile's HDFSBlocksDistribution first we call 
FileSystem#getFileStatus(path) and then 
FileSystem#getFileBlockLocations(status, start, length), so two namenode rpc 
call for every storefile. Instead we can use FileSystem#listLocatedStatus to 
get a LocatedFileStatus for the information we need, so reduce the namenode rpc 
call to one.


> Improve computeHDFSBlocksDistribution
> -------------------------------------
>
>                 Key: HBASE-16393
>                 URL: https://issues.apache.org/jira/browse/HBASE-16393
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: binlijin
>         Attachments: HBASE-16393.patch
>
>
> With our cluster is big, i can see the balancer is slow from time to time. 
> And the balancer will be called on master startup, so we can see the startup 
> is slow also. 
> The first thing i think whether if we can parallel compute different region's 
> HDFSBlocksDistribution. 
> The second i think we can improve compute single region's 
> HDFSBlocksDistribution.
> When to compute a storefile's HDFSBlocksDistribution first we call 
> FileSystem#getFileStatus(path) and then 
> FileSystem#getFileBlockLocations(status, start, length), so two namenode rpc 
> call for every storefile. Instead we can use FileSystem#listLocatedStatus to 
> get a LocatedFileStatus for the information we need, so reduce the namenode 
> rpc call to one. This can speed the computeHDFSBlocksDistribution, but also 
> send out less rpc call to namenode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-16393) Improve computeHDFSBlocksDistribution

Reply via email to