binlijin created HBASE-16393:
--------------------------------
Summary: Improve computeHDFSBlocksDistribution
Key: HBASE-16393
URL: https://issues.apache.org/jira/browse/HBASE-16393
Project: HBase
Issue Type: Improvement
Reporter: binlijin
With our cluster is big, i can see the balancer is slow from time to time. And
the balancer will be called on master startup, so we can see the startup is
slow also.
The first thing i think whether if we can parallel compute different region's
HDFSBlocksDistribution.
The second i think we can improve compute single region's
HDFSBlocksDistribution.
When to compute a storefile's HDFSBlocksDistribution first we call
FileSystem#getFileStatus(path) and then
FileSystem#getFileBlockLocations(status, start, length), so two namenode rpc
call for every storefile. Instead we can use FileSystem#listLocatedStatus to
get a LocatedFileStatus for the information we need, so reduce the namenode rpc
call to one.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)