[
https://issues.apache.org/jira/browse/HBASE-26590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464196#comment-17464196
]
Huaxiang Sun commented on HBASE-26590:
--------------------------------------
Thanks [~zhangduo].
For master, I think 10 is fine as all results are cached to the meta cache, so
they are not wasted.
For hbase-2, the extra 4 results are not cached so a bit concern. The issue
happened during the job restart, when ~700 hbase client starts at the same time
with an empty meta cache, so there is a meta scan storm, there are ~300k
regions in the meta table. I am not sure at this moment that this is the main
factor as my testing result shows way less impact as the one observed by the
production job.
Some background info:
The cluster is stable without region move.
There is meta replica Load Balance mode enabled at the 2.4.5 client side. Meta
Replica Region Server is fully synced with the primary region as the cluster is
stable. During my test, meta scan going through meta replica region does not
cause performance regression.
At my testing cluster, I can reproduce a bit regression with a RandomGet test
with 2.4.5 NettyRpcClient. After changing to BlockingRpcClient, this regression
is gone (5 ~ 10%).
I will submit this minor improvement patch and will work with the production
team again to see if there is any improvement with the patch and the new
BlockingRpcClient config.
If the meta replica region is out of sync with the primary region, there will
be lots of stale region locations, results in NotServingRegionException and
client will do retry with the primary meta region. This will cause the serious
latency issue, but this is not the case here. Anyway, I will keep an eye on it
when we are going retry with the new 2.4.5 client.
> Hbase-client Meta lookup performance regression between hbase-1 and hbase-2
> ---------------------------------------------------------------------------
>
> Key: HBASE-26590
> URL: https://issues.apache.org/jira/browse/HBASE-26590
> Project: HBase
> Issue Type: Improvement
> Components: meta
> Affects Versions: 2.4.0, 2.5.0, 2.3.7, 2.6.0
> Reporter: Huaxiang Sun
> Assignee: Huaxiang Sun
> Priority: Major
>
> One of our users complained higher latency after application upgrades from
> hbase-1.2 client (CDH-5.16.2) to hbase-2.4.5 client with meta replica Load
> Balance mode during app restart. I reproduced the regression by a test for
> meta lookup.
> At my test cluster, there are 160k regions for the test table, so there are
> 160k entries in meta region. Used one thread to do 1 million meta lookup
> against the meta region server.
>
> ||Version ||Meta Replica Load Balance Enabled||Time ||
> ||2.4.5-with-fixed||Yes||336458ms||
> ||2.4.5-with-fixed||No||333253ms||
> ||2.4.5||Yes||469980ms||
> ||2.4.5||No||470515ms||
> | *cdh-5.16.2*| *No* | *323412ms*|
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)