[ 
https://issues.apache.org/jira/browse/CALCITE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sysolyatin updated CALCITE-5388:
---------------------------------------
    Description: 
EnumerableWindow.getPartitionIterator method creates expression for creating  
'tempList' array [1] and EnumerableWindow.implement method creates expression 
for clearing this 'tempList'. Because state of created collection is mutable, 
the collection can not be reused in other instance of EnumerableWindow but it 
happens in the following use case:

{code:java}
with
    CTE1(rownr1, val1) as ( select ROW_NUMBER() OVER(ORDER BY id ASC), id from 
(values (1), (2)) as Vals1(id) ),
    CTE2(rownr2, val2) as ( select ROW_NUMBER() OVER(ORDER BY id ASC), id from 
(values (1), (2)) as Vals2(id) )
select
    CTE1.rownr1,
    CTE1.val1,
    CTE2.rownr2,
    CTE2.val2 
from
    CTE1,
    CTE2 
where
    CTE1.val1 = CTE2.val2{code}

Generated plan:
{code}
EnumerableHashJoin(condition=[=($1, $3)], joinType=[inner])
  EnumerableSort(sort0=[$1], dir0=[ASC])
    EnumerableCalc(expr#0..1=[{inputs}], EXPR$0=[$t1], ID=[$t0])
      EnumerableWindow(window#0=[window(order by [0] rows between UNBOUNDED 
PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
        EnumerableValues(tuples=[[{ 1 }, { 2 }]])
  EnumerableSort(sort0=[$1], dir0=[ASC])
    EnumerableCalc(expr#0..1=[{inputs}], EXPR$0=[$t1], ID=[$t0])
      EnumerableWindow(window#0=[window(order by [0] rows between UNBOUNDED 
PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
        EnumerableValues(tuples=[[{ 1 }, { 2 }]])
{code}

Calcite expression optimizer tries to remove duplicate expressions and as a 
result the same 'tempList' instance is used for both EnumerableWindow and the 
query returns empty result instead of:

|ROWNR1|VAL1|VAL2|
|1|1|1|
|2|2|2|

[1] 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableWindow.java#L677
[2] 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableWindow.java#L696
[3] 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableWindow.java#L518
 

  was:
EnumerableWindow.getPartitionIterator method creates expression for creating  
'tempList' array [1] or 'multiMap' map [2] depends on use case. And 
EnumerableWindow.implement method creates expression for clearing this 
collection. Because state of created collection is mutable, the collection can 
not be reused in other instance of EnumerableWindow but it happens in the 
following use case:

{code:java}
with
    CTE1(rownr1, val1) as ( select ROW_NUMBER() OVER(ORDER BY id ASC), id from 
(values (1), (2)) as Vals1(id) ),
    CTE2(rownr2, val2) as ( select ROW_NUMBER() OVER(ORDER BY id ASC), id from 
(values (1), (2)) as Vals2(id) )
select
    CTE1.rownr1,
    CTE1.val1,
    CTE2.rownr2,
    CTE2.val2 
from
    CTE1,
    CTE2 
where
    CTE1.val1 = CTE2.val2{code}

Generated plan:
{code}
EnumerableHashJoin(condition=[=($1, $3)], joinType=[inner])
  EnumerableSort(sort0=[$1], dir0=[ASC])
    EnumerableCalc(expr#0..1=[{inputs}], EXPR$0=[$t1], ID=[$t0])
      EnumerableWindow(window#0=[window(order by [0] rows between UNBOUNDED 
PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
        EnumerableValues(tuples=[[{ 1 }, { 2 }]])
  EnumerableSort(sort0=[$1], dir0=[ASC])
    EnumerableCalc(expr#0..1=[{inputs}], EXPR$0=[$t1], ID=[$t0])
      EnumerableWindow(window#0=[window(order by [0] rows between UNBOUNDED 
PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
        EnumerableValues(tuples=[[{ 1 }, { 2 }]])
{code}

Calcite expression optimizer tries to remove duplicate expressions and as a 
result the same 'tempList' instance is used for both EnumerableWindow and the 
query returns empty result instead of:

|ROWNR1|VAL1|VAL2|
|1|1|1|
|2|2|2|

[1] 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableWindow.java#L677
[2] 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableWindow.java#L696
[3] 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableWindow.java#L518
 


> tempList expression inside EnumerableWindow.getPartitionIterator should be 
> unoptimized
> --------------------------------------------------------------------------------------
>
>                 Key: CALCITE-5388
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5388
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.32.0
>            Reporter: Magnus Mogren
>            Assignee: Dmitry Sysolyatin
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.33.0
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> EnumerableWindow.getPartitionIterator method creates expression for creating  
> 'tempList' array [1] and EnumerableWindow.implement method creates expression 
> for clearing this 'tempList'. Because state of created collection is mutable, 
> the collection can not be reused in other instance of EnumerableWindow but it 
> happens in the following use case:
> {code:java}
> with
>     CTE1(rownr1, val1) as ( select ROW_NUMBER() OVER(ORDER BY id ASC), id 
> from (values (1), (2)) as Vals1(id) ),
>     CTE2(rownr2, val2) as ( select ROW_NUMBER() OVER(ORDER BY id ASC), id 
> from (values (1), (2)) as Vals2(id) )
> select
>     CTE1.rownr1,
>     CTE1.val1,
>     CTE2.rownr2,
>     CTE2.val2 
> from
>     CTE1,
>     CTE2 
> where
>     CTE1.val1 = CTE2.val2{code}
> Generated plan:
> {code}
> EnumerableHashJoin(condition=[=($1, $3)], joinType=[inner])
>   EnumerableSort(sort0=[$1], dir0=[ASC])
>     EnumerableCalc(expr#0..1=[{inputs}], EXPR$0=[$t1], ID=[$t0])
>       EnumerableWindow(window#0=[window(order by [0] rows between UNBOUNDED 
> PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
>         EnumerableValues(tuples=[[{ 1 }, { 2 }]])
>   EnumerableSort(sort0=[$1], dir0=[ASC])
>     EnumerableCalc(expr#0..1=[{inputs}], EXPR$0=[$t1], ID=[$t0])
>       EnumerableWindow(window#0=[window(order by [0] rows between UNBOUNDED 
> PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
>         EnumerableValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> Calcite expression optimizer tries to remove duplicate expressions and as a 
> result the same 'tempList' instance is used for both EnumerableWindow and the 
> query returns empty result instead of:
> |ROWNR1|VAL1|VAL2|
> |1|1|1|
> |2|2|2|
> [1] 
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableWindow.java#L677
> [2] 
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableWindow.java#L696
> [3] 
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableWindow.java#L518
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to