[ 
https://issues.apache.org/jira/browse/HBASE-23068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Feng updated HBASE-23068:
------------------------------
    Description: 
This is not a HBase issue, please delete it. 

Sorry.

  was:
In WhereOptimizer.pushKeyExpressionsToScan(), has a line of code: 
"extractNodes.addAll(nodesToExtract)" When executing sqls like "select * from 
... where A in (a1, a2, ..., a_n) and B = X", saying A in N (N > 100,000) 
elements, previous code execution will slow (> 90s in our environment).

This is because in such case, extractNodes is a HashSet, nodesToExtract is a 
List with N InListExpression (the N InListExpressions are the same instance), 
each InListExpression.values has N elements as well.

HashSet.addAll(list<InListExpression>) will call N times of 
InListExpression.hashCode(). Each time, InListExpression.hashCode() will 
calculate hashCode for every value. Therefore, the time complexity will be N^2.

A simple way to solve it is to remember of the hashCode of InListExpression and 
returns it directly if calculated once.

     Issue Type: Test  (was: Improvement)
       Priority: Trivial  (was: Critical)
        Summary: Please Delete this Issue  (was: Improve performance of 
InListExpression.hashCode)

> Please Delete this Issue
> ------------------------
>
>                 Key: HBASE-23068
>                 URL: https://issues.apache.org/jira/browse/HBASE-23068
>             Project: HBase
>          Issue Type: Test
>            Reporter: Chen Feng
>            Assignee: Chen Feng
>            Priority: Trivial
>
> This is not a HBase issue, please delete it. 
> Sorry.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to