[
https://issues.apache.org/jira/browse/BEAM-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944063#comment-16944063
]
Rui Wang commented on BEAM-7049:
--------------------------------
Ah ok. I found a tricky problem: given the merged cost based optimization, with
enabled UnionMerge rule, the calcite planner falls into an infinite loop
without choosing a plan.
It becomes a very tricky problem. I will need to spend many hours to understand
Calcite planner, BeamSQL's CBO implementation and others to understand the root
cause.
To answer your last question, what I was thinking was to have two rules for
UNION ALL and UNION respectively and each rule should overwrite [1]. So UNION
ALL rule will fire only for UNION ALL queries. UNION is the same. By doing so
you can separate implementation of underlying PTransform.
[1]:
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/plan/RelOptRule.java#L511
> Merge multiple input to one BeamUnionRel
> ----------------------------------------
>
> Key: BEAM-7049
> URL: https://issues.apache.org/jira/browse/BEAM-7049
> Project: Beam
> Issue Type: Improvement
> Components: dsl-sql
> Reporter: Rui Wang
> Assignee: sridhar Reddy
> Priority: Major
> Time Spent: 1h
> Remaining Estimate: 0h
>
> BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c`
> will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If
> BeamUnionRel can handle multiple shuffles, we will have only one shuffle
--
This message was sent by Atlassian Jira
(v8.3.4#803005)