[
https://issues.apache.org/jira/browse/HBASE-24298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17158851#comment-17158851
]
star edited comment on HBASE-24298 at 7/16/20, 3:08 AM:
--------------------------------------------------------
A prototype implementation for review. It is a configurable feature in hbase
client. You can activate the optimization per table by setting properties as
following.
{code:java}
// table 'user' is optimized with ByteCopyOnWriteArrayMap
conf.setString("hbase.client.meta.cache.optimize.user", "4,2");{code}
Hbase client will cache meta data of table 'user', and locate regions in an
optimized way. "4,2" means skipping 4 bytes and using followed 2 bytes to
contract the ranger of binary search.
The optimization reduce 50% of cpu time in a simple performance test case from
our production environment TestByteCopyOnWriteMaps#testLowerKeyPerformance.
was (Author: starphin):
A prototype implementation for review. It is a configurable feature in hbase
client. You can activate the optimization per table by setting properties as
following.
{code:java}
// table 'user' is optimized with ByteCopyOnWriteArrayMap
conf.setString("hbase.client.meta.cache.optimize.user", "4,2");{code}
Hbase client will cache meta data of table 'user', and locate regions in an
optimized way. "4,2" means skipping 4 bytes and using followed 2 bytes to
contract the ranger of binary search.
> Reduce cpu load of locating region especially in batch mode.
> ------------------------------------------------------------
>
> Key: HBASE-24298
> URL: https://issues.apache.org/jira/browse/HBASE-24298
> Project: HBase
> Issue Type: Bug
> Reporter: star
> Assignee: star
> Priority: Major
> Attachments: HBASE-24298.patch, locating region.svg
>
>
> Binary search is used to speedup the process of locating region. It is
> already fast enough, while cpu of HBASE client becomes the bottleneck when
> doing TCSB benchmark. We can make the process of locating region faster to
> reduce cpu load in some special cases , which however is our common case in
> production environment. It is the case:
> 1. Predefined splits in uniform distribution.
>
> 2. Load data in batch mode.
> The optimization is very simple, just to contract range of binary search.
> Initially, record all startIndex and endIndex of first or two bytes of keys.
> When a region key comes, find the contracted startIndex and endIndex of the
> key. Then return to normal binary search process with the specified
> startIndex and endIndex.
> Then we can ideally reduce cpu to 1/8 with 1 byte or 1/16 with 2 bytes.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)