[ 
https://issues.apache.org/jira/browse/KYLIN-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475057#comment-17475057
 ] 

hujiahua commented on KYLIN-5153:
---------------------------------

I found that this issue had been solved in branch kycalcite-1.16.0.x-4.x of 
Kyligence/calcite 
(https://github.com/Kyligence/calcite/blob/3f70e06e1a4af14acd68f31f5e51bfab8974c499/core/src/main/java/org/apache/calcite/sql2rel/SqlToRelConverter.java#L1154-L1159),
  but this patch is not used in currently kylin4 version.

> Big in-list query cause slow performance 
> -----------------------------------------
>
>                 Key: KYLIN-5153
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5153
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Query Engine
>    Affects Versions: v4.0.1
>            Reporter: hujiahua
>            Priority: Major
>
> {code:java}
> select SELLER_ID,sum(PRICE) from KYLIN_SALES where SELLER_ID in 
> (10000001,10000002,10000003,10000004,10000005,10000006)  GROUP BY SELLER_ID 
> {code}
> Current the above SQL will convert to a spark physical plan like this:
> {code:java}
> Project [2#122L AS F__KYLIN_SALES_SELLER_ID__1_4392b2b0__0#128L, 5#124 AS 
> F__SUM_PRICE__1_4392b2b0__2#130]
> +- Filter ((((((2#122L = 10000001) || (2#122L = 10000002)) || (2#122L = 
> 10000003)) || (2#122L = 10000004)) || (2#122L = 10000005)) || (2#122L = 
> 10000006))
>    +- FileScan parquet [2#122L,5#124] Batched: false, Format: Parquet, 
> Location: 
> FilePruner[file:/Users/hujiahua/work/project/yz-kylin/examples/test_case_data/sample_local/defaul...,
>  PartitionFilters: [], PushedFilters: 
> [Or(Or(Or(Or(Or(EqualTo(2,10000001),EqualTo(2,10000002)),EqualTo(2,10000003)),EqualTo(2,10000004)...,
>  ReadSchema: struct<2:bigint,5:decimal(29,4)> {code}
> IN-LIST expression will always convert to OR expression. If the size of LIST 
> was relatively small, it work fine. But when the size of LIST get bigger (The 
> size value was  1000+ in our production case), it will have performance 
> issues (the RT was more than 10 seconds). Too many OR expression cause spend 
> too many time in plan optimization phase and spark code generation phase. 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to