[ 
https://issues.apache.org/jira/browse/BEAM-9825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096943#comment-17096943
 ] 

Rui Wang edited comment on BEAM-9825 at 4/30/20, 8:02 PM:
----------------------------------------------------------

Hello Darshan,

Thanks for opening this Jira.

First of all, I think you proposal is to implement a few more composed 
transforms to further encapsulate SQL's UNION/INTERSECT/EXCEPT. Right now 
BeamSQL implements such SET operations by two steps: a GoGroup and then a 
filter [1]. Thus your proposal will further merge these two steps into one 
single composed transform. 


To me I am ok to have these transforms implemented into [2], because such SET 
operations from relational algebra have clear semantic and build SQL on schema 
operations  is the ultimate goal for BeamSQL. Further more, other users can 
reuse such transforms than doing a two-step operation.

If you want to open a PR, please consider the following advices:
1. Use term from SQL. E.g. name your transforms as UNION, INTERSECT, EXCEPT (or 
MINUS)
2. Support SET ALL and SET DISTINCT semantics
3. Migrate BeamSQL SET implementation to your implementation.


[1]: 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSetOperatorRelBase.java#L86
[2]: 
https://github.com/apache/beam/tree/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms
 


was (Author: amaliujia):
Hello Darshan,

Thanks for opening this Jira.

First of all, I think you proposal is to implement a few more composed 
transforms to further encapsulate SQL's UNION/INTERSECT/EXCEPT. Right now 
BeamSQL implements such set operation by two steps: a GoGroup and then a filter 
[1]. Thus your proposal will further merge these two steps into one single 
composed transform. 


To me I am ok to have these transforms implemented into [2], because such set 
operations in relation has clear semantic and build SQL on schema operations  
is the ultimate goal for BeamSQL. Further more, other users can reuse such 
transforms than doing a two-step operation.

If you want to open a PR, please consider the following advices:
1. Use term from SQL. E.g. name your transforms as UNION, INTERSECT, EXCEPT (or 
MINUS)
2. Support SET ALL and SET DISTINCT semantics
3. Migrate BeamSQL SET implementation to your implementation.


[1]: 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSetOperatorRelBase.java#L86
[2]: 
https://github.com/apache/beam/tree/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms
 

> Transforms for Intersect, Difference and Commons 
> -------------------------------------------------
>
>                 Key: BEAM-9825
>                 URL: https://issues.apache.org/jira/browse/BEAM-9825
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Darshan Jani
>            Assignee: Darshan Jani
>            Priority: Major
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> I'd like to propose following new high-level transforms.
>  * Intersect
> Compute the intersection between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that common to both _leftCollection_ and 
> _rightCollection_
>  
>  * Difference
> Compute the difference between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that are in _leftCollection_ but not in 
> _rightCollection_
>  * Commons
> Find the elements that are commons to two PCollection, similar like the Unix
> comm utility.
> Given _leftCollection_ and rightCollection, this transform returns a 
> CommonsResults with following:
>  # elements only in _leftCollection_
>  # elements only in _rightCollection_
>  # elements in both collections
> I would like to work on this changes and submit a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to