[
https://issues.apache.org/jira/browse/PHOENIX-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chen Feng updated PHOENIX-5491:
-------------------------------
Description:
In WhereOptimizer.pushKeyExpressionsToScan(), has a line of code:
"extractNodes.addAll(nodesToExtract)"
When executing sqls like "select * from ... where A in (a1, a2, ..., a_n) and B
= X", saying A in N (N > 100,000) elements, previous code execution will slow
(> 90s in our environment).
This is because in such case, extractNodes is a HashSet, nodesToExtract is a
List with N InListExpression (the N InListExpressions are the same instance),
each InListExpression.values has N elements as well.
HashSet.addAll(list<InListExpression>) will call N times of
InListExpression.hashCode(). Each time, InListExpression.hashCode() will
calculate hashCode for every value. Therefore, the time complexity will be N^2.
A simple way to solve it is to remember of the hashCode of InListExpression and
returns it directly if calculated once. The query will finish in 5 seconds.
was:
In WhereOptimizer.pushKeyExpressionsToScan(), has a line of code:
"extractNodes.addAll(nodesToExtract)" When executing sqls like "select * from
... where A in (a1, a2, ..., a_n) and B = X", saying A in N (N > 100,000)
elements, previous code execution will slow (> 90s in our environment).
This is because in such case, extractNodes is a HashSet, nodesToExtract is a
List with N InListExpression (the N InListExpressions are the same instance),
each InListExpression.values has N elements as well.
HashSet.addAll(list<InListExpression>) will call N times of
InListExpression.hashCode(). Each time, InListExpression.hashCode() will
calculate hashCode for every value. Therefore, the time complexity will be N^2.
A simple way to solve it is to remember of the hashCode of InListExpression and
returns it directly if calculated once.
> Improve performance of InListExpression.hashCode
> ------------------------------------------------
>
> Key: PHOENIX-5491
> URL: https://issues.apache.org/jira/browse/PHOENIX-5491
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 5.0.0, 4.14.3
> Reporter: Chen Feng
> Assignee: Chen Feng
> Priority: Critical
> Attachments: PHOENIX-5491.patch
>
>
> In WhereOptimizer.pushKeyExpressionsToScan(), has a line of code:
> "extractNodes.addAll(nodesToExtract)"
> When executing sqls like "select * from ... where A in (a1, a2, ..., a_n) and
> B = X", saying A in N (N > 100,000) elements, previous code execution will
> slow (> 90s in our environment).
> This is because in such case, extractNodes is a HashSet, nodesToExtract is a
> List with N InListExpression (the N InListExpressions are the same instance),
> each InListExpression.values has N elements as well.
> HashSet.addAll(list<InListExpression>) will call N times of
> InListExpression.hashCode(). Each time, InListExpression.hashCode() will
> calculate hashCode for every value. Therefore, the time complexity will be
> N^2.
> A simple way to solve it is to remember of the hashCode of InListExpression
> and returns it directly if calculated once. The query will finish in 5
> seconds.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)