[jira] Commented: (HIVE-870) semi joins

Namit Jain (JIRA) Sun, 08 Nov 2009 21:28:57 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774863#action_12774863
 ]


Namit Jain commented on HIVE-870:
---------------------------------

Can you add more tests with STREAMTABLE also ?

Do you want to separate out the comment changes and file a new jira for that ?
That is blowing up the number of files, and making it difficult to review. If 
you
think that will help, please file a new jira and submit a patch for that - I 
will try to 
take a look at that asap.

> semi joins
> ----------
>
>                 Key: HIVE-870
>                 URL: https://issues.apache.org/jira/browse/HIVE-870
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>         Attachments: Hive-870.patch, Hive-870_2.patch
>
>
> Semi-join is an efficient way to unnest an IN/EXISTS subquery. For example,
> select * 
> from A
> where A.id IN 
>    (select id
>     from B
>     where B.date> '2009-10-01');
> returns from A whose ID is in the set of IDs found in B, whose date is 
> greater than a certain date. This query can be unnested using a INNER join or 
> LEFT OUTER JOIN, but we need to deduplicate the IDs returned by the subquery 
> on table B. The semantics of LEFT SEMI JOIN is that as long as there is ANY 
> row in the right-hand table that matches the join key, the left-hand table 
> row will be emitted as a result w/o necessarily looking further in the 
> right-hand table for further matches. This is exactly the semantics of the IN 
> subquery. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-870) semi joins

Reply via email to