[ 
https://issues.apache.org/jira/browse/BEAM-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944063#comment-16944063
 ] 

Rui Wang commented on BEAM-7049:
--------------------------------

Ah ok. I found a tricky problem: given the merged cost based optimization, with 
enabled UnionMerge rule, the calcite planner falls into an infinite loop 
without choosing a plan. 

It becomes a very tricky problem. I will need to spend many hours to understand 
Calcite planner, BeamSQL's CBO implementation and others to understand the root 
cause. 


To answer your last question, what I was thinking was to have two rules for 
UNION ALL and UNION respectively and each rule should overwrite [1]. So UNION 
ALL rule will fire only for UNION ALL queries. UNION is the same. By doing so 
you can separate implementation of underlying PTransform.


[1]: 
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/plan/RelOptRule.java#L511
 

> Merge multiple input to one BeamUnionRel
> ----------------------------------------
>
>                 Key: BEAM-7049
>                 URL: https://issues.apache.org/jira/browse/BEAM-7049
>             Project: Beam
>          Issue Type: Improvement
>          Components: dsl-sql
>            Reporter: Rui Wang
>            Assignee: sridhar Reddy
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` 
> will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If 
> BeamUnionRel can handle multiple shuffles, we will have only one shuffle



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to