[ https://issues.apache.org/jira/browse/HIVE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu updated HIVE-741: ----------------------------------------- Attachment: patch-741-2.txt Patch fixes SMBMapJoinOperator also. I modified compareKeys(ArrayList<Object> k1, ArrayList<Object> k2) to do the following: {code} if (hasNullElements(k1) && hasNullElements(k2)) { return -1; // just return k1 is smaller than k2 } else if (hasNullElements(k1)) { return (0 - k2.size()); } else if (hasNullElements(k2)) { return k1.size(); } ... //the existing code. {code} Does the above make sense? Updated the testcase with smb join queries. When I'm running smb join on my local machine (pseudo distributed mode), I'm getting different results. I think that is mostly because of HIVE-1561. Will update the issue with my findings. > NULL is not handled correctly in join > ------------------------------------- > > Key: HIVE-741 > URL: https://issues.apache.org/jira/browse/HIVE-741 > Project: Hadoop Hive > Issue Type: Bug > Reporter: Ning Zhang > Assignee: Amareshwari Sriramadasu > Attachments: patch-741-1.txt, patch-741-2.txt, patch-741.txt, > smbjoin_nulls.q.txt > > > With the following data in table input4_cb: > Key Value > ------ -------- > NULL 325 > 18 NULL > The following query: > {code} > select * from input4_cb a join input4_cb b on a.key = b.value; > {code} > returns the following result: > NULL 325 18 NULL > The correct result should be empty set. > When 'null' is replaced by '' it works. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.