[
https://issues.apache.org/jira/browse/SPARK-21274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16566399#comment-16566399
]
Apache Spark commented on SPARK-21274:
--------------------------------------
User 'dilipbiswal' has created a pull request for this issue:
https://github.com/apache/spark/pull/21963
> Implement EXCEPT ALL and INTERSECT ALL
> --------------------------------------
>
> Key: SPARK-21274
> URL: https://issues.apache.org/jira/browse/SPARK-21274
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 2.0.0, 2.1.0, 2.2.0
> Reporter: Ruslan Dautkhanov
> Assignee: Dilip Biswal
> Priority: Major
> Fix For: 2.4.0
>
>
> 1) *EXCEPT ALL* / MINUS ALL :
> {code}
> SELECT a,b,c FROM tab1
> EXCEPT ALL
> SELECT a,b,c FROM tab2
> {code}
> can be rewritten as following outer join:
> {code}
> SELECT a,b,c
> FROM tab1 t1
> LEFT OUTER JOIN
> tab2 t2
> ON (
> (t1.a, t1.b, t1.c) = (t2.a, t2.b, t2.c)
> )
> WHERE
> COALESCE(t2.a, t2.b, t2.c) IS NULL
> {code}
> (register as a temp.view this second query under "*t1_except_t2_df*" name
> that can be also used to find INTERSECT ALL below):
> 2) *INTERSECT ALL*:
> {code}
> SELECT a,b,c FROM tab1
> INTERSECT ALL
> SELECT a,b,c FROM tab2
> {code}
> can be rewritten as following anti-join using t1_except_t2_df we defined
> above:
> {code}
> SELECT a,b,c
> FROM tab1 t1
> WHERE
> NOT EXISTS
> ( SELECT 1
> FROM t1_except_t2_df e
> WHERE (t1.a, t1.b, t1.c) = (e.a, e.b, e.c)
> )
> {code}
> So the suggestion is just to use above query rewrites to implement both
> EXCEPT ALL and INTERSECT ALL sql set operations.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]