[
https://issues.apache.org/jira/browse/HIVE-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ning Zhang updated HIVE-870:
----------------------------
Attachment: Hive-870.patch
Uploading Hive-870.patch. This patch includes the following changes:
1) enhance the HiveQL syntax to support left semi join.
2) introduce a new left semi join type in the CommonJoinOperator. This join
operator implements early-exit whenever a match is found in the right-hand-side
table of the left semi join.
3) At the map side, add a select operator to project the join keys only of the
RHS table, followed by a map-side partial group-by operator that eliminate
duplicate keys. We only need the key, the value is NULL.
4) if the RHS is used as map-side join, only the selection operator is
introduced. The map-side groupby operator is not necessary.
5) some misc clean ups (e.g., allowing '--' comments appear in any place in the
unit test qfiles). A lot of unit test diffs are due to this change. All the
unit tests for semi join are in semijoin.q.
> semi joins
> ----------
>
> Key: HIVE-870
> URL: https://issues.apache.org/jira/browse/HIVE-870
> Project: Hadoop Hive
> Issue Type: New Feature
> Reporter: Ning Zhang
> Assignee: Ning Zhang
> Attachments: Hive-870.patch
>
>
> Semi-join is an efficient way to unnest an IN/EXISTS subquery. For example,
> select *
> from A
> where A.id IN
> (select id
> from B
> where B.date> '2009-10-01');
> returns from A whose ID is in the set of IDs found in B, whose date is
> greater than a certain date. This query can be unnested using a INNER join or
> LEFT OUTER JOIN, but we need to deduplicate the IDs returned by the subquery
> on table B. The semantics of LEFT SEMI JOIN is that as long as there is ANY
> row in the right-hand table that matches the join key, the left-hand table
> row will be emitted as a result w/o necessarily looking further in the
> right-hand table for further matches. This is exactly the semantics of the IN
> subquery.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.