[ 
https://issues.apache.org/jira/browse/CALCITE-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruben Q L updated CALCITE-3221:
-------------------------------
    Description: 
Currently, the union operation offered by Calcite (see 
[EnumerableDefaults.union|https://github.com/apache/calcite/blob/d98856bf1a5f5c151d004b769e14bdd368a67234/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L2747])
 "breaks" the collation (if any) of its inputs.

The goal of this issue is to create a new union algorithm 
(EnumerableMergeUnion) that, given the fact that its inputs are sorted by a 
certain collation, will return the union / union all result respecting this 
collation.

Most likely the implementation of the merge join can be useful.

  was:
Currently, the union operation offered by Calcite is based on a {{HashSet}} 
(see 
[EnumerableDefaults.union|https://github.com/apache/calcite/blob/d98856bf1a5f5c151d004b769e14bdd368a67234/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L2747])
 and necessitates reading in memory all rows before returning a single result.  
 

Apart from increased memory consumption the operator is blocking and also 
destroys the order of its inputs.  

The goal of this issue is to add a new union algorithm (EnumerableMergeUnion ?) 
exploiting the fact that the inputs are sorted which consumes less memory and 
retains the order of its inputs.   

Most likely the implementation of the merge join can be useful.


> Add a sort-merge union algorithm
> --------------------------------
>
>                 Key: CALCITE-3221
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3221
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.19.0
>            Reporter: Stamatis Zampetakis
>            Assignee: Ruben Q L
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: screenshot-1.png
>
>          Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> Currently, the union operation offered by Calcite (see 
> [EnumerableDefaults.union|https://github.com/apache/calcite/blob/d98856bf1a5f5c151d004b769e14bdd368a67234/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L2747])
>  "breaks" the collation (if any) of its inputs.
> The goal of this issue is to create a new union algorithm 
> (EnumerableMergeUnion) that, given the fact that its inputs are sorted by a 
> certain collation, will return the union / union all result respecting this 
> collation.
> Most likely the implementation of the merge join can be useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to