[
https://issues.apache.org/jira/browse/CALCITE-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17115423#comment-17115423
]
Haisheng Yuan commented on CALCITE-3221:
----------------------------------------
No, I don't think combining concatenation and distinct into a single UNION
operator is a good practice (in optimizer level), in the long term.
If we create 2 physical operators (EnumerableUnion, EnumerableMergeUnion), then
- EnumerableUnion needs to take care of concatenation (all=true) and hash-based
distinct(all=false) logic. This is current what we have.
- EnumerableMergeUnion needs take care of concatenation (all=true,
merge=false), sorted-merge concatenation (all=true, merge=true), sorted-merge
distinct (all=false, merge=true).
If we go this way, I don't think downstream projects can just reuse the plan
generated by Calcite.
If we only keep physical UNION ALL, then we only need
concatenation(merge=false), concatenation(merge=true).
> Add a sort-merge union algorithm
> --------------------------------
>
> Key: CALCITE-3221
> URL: https://issues.apache.org/jira/browse/CALCITE-3221
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.19.0
> Reporter: Stamatis Zampetakis
> Priority: Minor
> Attachments: screenshot-1.png
>
>
> Currently, the union operation offered by Calcite is based on a {{HashSet}}
> (see
> [EnumerableDefaults.union|https://github.com/apache/calcite/blob/d98856bf1a5f5c151d004b769e14bdd368a67234/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L2747])
> and necessitates reading in memory all rows before returning a single
> result.
> Apart from increased memory consumption the operator is blocking and also
> destroys the order of its inputs.
> The goal of this issue is to add a new union algorithm (EnumerableMergeUnion
> ?) exploiting the fact that the inputs are sorted which consumes less memory
> and retains the order of its inputs.
> Most likely the implementation of the merge join can be useful.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)