[ https://issues.apache.org/jira/browse/HBASE-16398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thiruvel Thirumoolan updated HBASE-16398: ----------------------------------------- Attachment: LocatedBlockStatusComparison.java Uploading client code that can be used on your cluster to see performance difference and correctness of the logic. The approach is very similar to yours, but I ensured I didn't want to create unnecessary objects as its usually run for a lot of regions. I also took care of ref/links. I also have a verification section - where I compare the old and new mechanisms. Since there are too many splits happen all the time, I had to do a second verification if the first one fails. Let me know if you have any questions. > optimize HRegion computeHDFSBlocksDistribution > ---------------------------------------------- > > Key: HBASE-16398 > URL: https://issues.apache.org/jira/browse/HBASE-16398 > Project: HBase > Issue Type: Sub-task > Reporter: binlijin > Assignee: binlijin > Attachments: HBASE-16398.patch, LocatedBlockStatusComparison.java > > > First i assume there is no reference and link in a region family's directory. > Without the patch to computeHDFSBlocksDistribution for a region family, there > is 1+2*N rpc call, N is hfile numbers, The first rpc call is to > DistributedFileSystem#listStatus to get hfiles, for every hfile there is two > rpc call DistributedFileSystem#getFileStatus(path) and then > DistributedFileSystem#getFileBlockLocations(status, start, length). > With the patch to computeHDFSBlocksDistribution for a region family, there is > 2 rpc call, they are DistributedFileSystem#getFileStatus(path) and > DistributedFileSystem#listLocatedStatus(final Path p, final PathFilter > filter). > So if there is at least one hfile, with the patch, the rpc call will less. -- This message was sent by Atlassian JIRA (v6.3.4#6332)