[ 
https://issues.apache.org/jira/browse/BEAM-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899191#comment-16899191
 ] 

Rui Wang edited comment on BEAM-7049 at 8/2/19 8:29 PM:
--------------------------------------------------------

[~sridharG]

A good start query is
 " SELECT 1 UNION 2 UNION 3".

good entry pointers are:

https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUnionRule.java#L43
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRel.java#L66
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSetOperatorRelBase.java#L61


The expected resolution is you can use multiple tags to GoGBK multiple 
PCollections through the same shuffle: 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSetOperatorRelBase.java#L85

And then you can improve binary implementation to make it handle more than two 
PCollection here: 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java#L60


was (Author: amaliujia):
A good start query is
 " SELECT 1 UNION 2 UNION 3".

good entry pointers are:

https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamUnionRule.java#L43
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRel.java#L66
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSetOperatorRelBase.java#L61


The expected resolution is you can use multiple tags to GoGBK multiple 
PCollections through the same shuffle: 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSetOperatorRelBase.java#L85

And then you can improve binary implementation to make it handle more than two 
PCollection here: 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java#L60

> BeamUnionRel should work on mutiple input 
> ------------------------------------------
>
>                 Key: BEAM-7049
>                 URL: https://issues.apache.org/jira/browse/BEAM-7049
>             Project: Beam
>          Issue Type: Improvement
>          Components: dsl-sql
>            Reporter: Rui Wang
>            Assignee: sridhar Reddy
>            Priority: Major
>
> BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` 
> will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If 
> BeamUnionRel can handle multiple shuffles, we will have only one shuffle



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to