[
https://issues.apache.org/jira/browse/HBASE-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
binlijin updated HBASE-16393:
-----------------------------
Attachment: HBASE-16393.patch
> Improve computeHDFSBlocksDistribution
> -------------------------------------
>
> Key: HBASE-16393
> URL: https://issues.apache.org/jira/browse/HBASE-16393
> Project: HBase
> Issue Type: Improvement
> Reporter: binlijin
> Attachments: HBASE-16393.patch
>
>
> With our cluster is big, i can see the balancer is slow from time to time.
> And the balancer will be called on master startup, so we can see the startup
> is slow also.
> The first thing i think whether if we can parallel compute different region's
> HDFSBlocksDistribution.
> The second i think we can improve compute single region's
> HDFSBlocksDistribution.
> When to compute a storefile's HDFSBlocksDistribution first we call
> FileSystem#getFileStatus(path) and then
> FileSystem#getFileBlockLocations(status, start, length), so two namenode rpc
> call for every storefile. Instead we can use FileSystem#listLocatedStatus to
> get a LocatedFileStatus for the information we need, so reduce the namenode
> rpc call to one.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)