[ 
https://issues.apache.org/jira/browse/CALCITE-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Runkang He updated CALCITE-5661:
--------------------------------
    Description: 
When IN list is large, the plan generation is time-consuming, after benchmark, 
when the IN value list size was 3w, it took 2 minutes to generate the final 
plan.
{code:sql}
select empno from emp where deptno in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
..., 30000){code}
We find that in sql-to-rel phase, there are two methods to convert IN predicate 
to RelNode:

1.IN list size is below InSubQueryThreshold, convert IN to OR;

2.IN list size is over InSubQueryThreshold, convert IN to VALUES + JOIN.

The first one will be very time-consuming in the expression simplification 
stage for the large OR predicate. As mentioned before, when the IN value list 
size was 3w, it took 2 minutes, which is not acceptable in OLAP scenarios.

The second one will not be able to apply predicate pushdown, which it is very 
important in OLAP scenarios.

So maybe we need to support converting IN to RexCall directly to avoid the 
disadvantages of the above two methods.

After POC, when convert IN to RexCall directly, it takes less than 1 second to 
generate the final plan.

  was:
When IN list is large, the plan generation is time-consuming, after benchmark, 
when the IN value list size was 3w, it took 2 minutes to generate the final 
plan.
{code:sql}
select empno from emp where deptno in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
..., 30000){code}
We find that in sql-to-rel phase, there are two methods to convert IN predicate 
to RelNode:

1.IN list size is below InSubQueryThreshold, convert IN to OR;

2.IN list size is over InSubQueryThreshold, convert IN to VALUES + JOIN.

The first one will be very time-consuming in the expression simplification 
stage for the large OR predicate. As mentioned before, when the IN value list 
size was 3w, it took 2 minutes, which is not acceptable in OLAP scenarios.

The second one will not be able to apply IN predicate pushdown, which it is 
very important in OLAP scenarios.

So maybe we need to support converting IN to RexCall directly to avoid the 
disadvantages of the above two methods.

After POC, when convert IN to RexCall directly, it takes less than 1 second to 
generate the final plan.


> Introduce another way to convert IN predicate to RelNode when IN list is large
> ------------------------------------------------------------------------------
>
>                 Key: CALCITE-5661
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5661
>             Project: Calcite
>          Issue Type: Improvement
>    Affects Versions: 1.34.0
>            Reporter: Runkang He
>            Priority: Major
>
> When IN list is large, the plan generation is time-consuming, after 
> benchmark, when the IN value list size was 3w, it took 2 minutes to generate 
> the final plan.
> {code:sql}
> select empno from emp where deptno in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
> ..., 30000){code}
> We find that in sql-to-rel phase, there are two methods to convert IN 
> predicate to RelNode:
> 1.IN list size is below InSubQueryThreshold, convert IN to OR;
> 2.IN list size is over InSubQueryThreshold, convert IN to VALUES + JOIN.
> The first one will be very time-consuming in the expression simplification 
> stage for the large OR predicate. As mentioned before, when the IN value list 
> size was 3w, it took 2 minutes, which is not acceptable in OLAP scenarios.
> The second one will not be able to apply predicate pushdown, which it is very 
> important in OLAP scenarios.
> So maybe we need to support converting IN to RexCall directly to avoid the 
> disadvantages of the above two methods.
> After POC, when convert IN to RexCall directly, it takes less than 1 second 
> to generate the final plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to