[jira] Commented: (HIVE-741) NULL is not handled correctly in join

Ning Zhang (JIRA) Sun, 15 Aug 2010 14:38:43 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898756#action_12898756
 ]


Ning Zhang commented on HIVE-741:
---------------------------------

@Amareshwar, currently we already distinguish different join types with 
different functions (take a look at CommonJoinOperator.joinObjects()). I look 
forward to seeing your proposal to avoid grouping null-keyed rows.

@Ted, I agree with Amareshwar and John that we cannot avoid rows (or the value 
part of the key-value pairs) with null as a key. However you have a point in 
that if we know the join operator does not involve outer join at all (we 
already have a flag noOuterJoin in JoinDesc), then we could avoid sending rows 
will null keys from the mappers to the reducers. This will save bandwidth as 
well as processing time. Could you open another JIRA and be able to submit a 
patch?


> NULL is not handled correctly in join
> -------------------------------------
>
>                 Key: HIVE-741
>                 URL: https://issues.apache.org/jira/browse/HIVE-741
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Ning Zhang
>            Assignee: Amareshwari Sriramadasu
>
> With the following data in table input4_cb:
> Key        Value
> ------       --------
> NULL     325
> 18          NULL
> The following query:
> {code}
> select * from input4_cb a join input4_cb b on a.key = b.value;
> {code}
> returns the following result:
> NULL    325    18   NULL
> The correct result should be empty set.
> When 'null' is replaced by '' it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-741) NULL is not handled correctly in join

Reply via email to