[
https://issues.apache.org/jira/browse/KYLIN-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475057#comment-17475057
]
hujiahua commented on KYLIN-5153:
---------------------------------
I found that this issue had been solved in branch kycalcite-1.16.0.x-4.x of
Kyligence/calcite
(https://github.com/Kyligence/calcite/blob/3f70e06e1a4af14acd68f31f5e51bfab8974c499/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L1154-L1159),
but this patch is not used in currently kylin4 version.
> Big in-list query cause slow performance
> -----------------------------------------
>
> Key: KYLIN-5153
> URL: https://issues.apache.org/jira/browse/KYLIN-5153
> Project: Kylin
> Issue Type: Improvement
> Components: Query Engine
> Affects Versions: v4.0.1
> Reporter: hujiahua
> Priority: Major
>
> {code:java}
> select SELLER_ID,sum(PRICE) from KYLIN_SALES where SELLER_ID in
> (10000001,10000002,10000003,10000004,10000005,10000006) GROUP BY SELLER_ID
> {code}
> Current the above SQL will convert to a spark physical plan like this:
> {code:java}
> Project [2#122L AS F__KYLIN_SALES_SELLER_ID__1_4392b2b0__0#128L, 5#124 AS
> F__SUM_PRICE__1_4392b2b0__2#130]
> +- Filter ((((((2#122L = 10000001) || (2#122L = 10000002)) || (2#122L =
> 10000003)) || (2#122L = 10000004)) || (2#122L = 10000005)) || (2#122L =
> 10000006))
> +- FileScan parquet [2#122L,5#124] Batched: false, Format: Parquet,
> Location:
> FilePruner[file:/Users/hujiahua/work/project/yz-kylin/examples/test_case_data/sample_local/defaul...,
> PartitionFilters: [], PushedFilters:
> [Or(Or(Or(Or(Or(EqualTo(2,10000001),EqualTo(2,10000002)),EqualTo(2,10000003)),EqualTo(2,10000004)...,
> ReadSchema: struct<2:bigint,5:decimal(29,4)> {code}
> IN-LIST expression will always convert to OR expression. If the size of LIST
> was relatively small, it work fine. But when the size of LIST get bigger (The
> size value was 1000+ in our production case), it will have performance
> issues (the RT was more than 10 seconds). Too many OR expression cause spend
> too many time in plan optimization phase and spark code generation phase.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)