[ 
https://issues.apache.org/jira/browse/HBASE-25510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266600#comment-17266600
 ] 

Viraj Jasani commented on HBASE-25510:
--------------------------------------

Thanks for filing this Jira [~zhengzhuobinzzb]. Since you have raised PR and 
you are working on this, I have provided Jira contributor access to you and 
assigned this Jira to you. Going forward, you will be able to assign Jira to 
yourself.

> Optimize TableName.valueOf from O(n) to O(1).  We can get benefits when the 
> number of tables in the cluster is greater than dozens
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-25510
>                 URL: https://issues.apache.org/jira/browse/HBASE-25510
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Replication
>    Affects Versions: 1.2.12, 1.4.13, 2.4.1
>            Reporter: zhuobin zheng
>            Assignee: zhuobin zheng
>            Priority: Major
>         Attachments: optimiz_benchmark, origin_benchmark, stucks-profile-info
>
>
> Now, TableName.valueOf will try to find TableName Object in cache 
> linearly(code show as below). So it is too slow when we has  thousands of 
> tables on cluster.
> {code:java}
> // code placeholder
> for (TableName tn : tableCache) {
>   if (Bytes.equals(tn.getQualifier(), qns) && Bytes.equals(tn.getNamespace(), 
> bns)) {
>     return tn;
>   }
> }{code}
> I try to store the object in the hash table, so it can look up more quickly. 
> code like this
> {code:java}
> // code placeholder
> TableName oldTable = tableCache.get(nameAsStr);{code}
>  
> In our cluster which has tens thousands of tables. (Most of that is KYLIN 
> table). 
>  We found that in the following two cases, the TableName.valueOf method will 
> severely restrict our performance.
>   
>  Common premise: tens of thousands table in cluster
>  cause: TableName.valueOf with low performance. (because we need to traverse 
> all caches linearly)
>   
>  Case1. Replication
>  premise1: one of table write with high qps, small value, Non-batch request. 
> cause too much wal entry
> premise2: deserialize WAL Entry includes calling the TableName.valueOf method.
> Cause: Replicat Stuck. A lot of WAL files pile up.
>  
> Case2. Active Master Start up
> NamespaceStateManager init should init all RegionInfo, and regioninfo init 
> will call TableName.valueOf.  It will cost some time if TableName.valueOf is 
> slow.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to