[ https://issues.apache.org/jira/browse/HIVE-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yin Huai updated HIVE-5697: --------------------------- Issue Type: Sub-task (was: Bug) Parent: HIVE-3667 > Correlation Optimizer may generate wrong plans for cases involving outer join > ----------------------------------------------------------------------------- > > Key: HIVE-5697 > URL: https://issues.apache.org/jira/browse/HIVE-5697 > Project: Hive > Issue Type: Sub-task > Affects Versions: 0.12.0, 0.13.0 > Reporter: Yin Huai > Assignee: Yin Huai > > For example, > {code:sql} > select x.key, y.value, count(*) from src x right outer join src1 y on > (x.key=y.key and x.value=y.value) group by x.key, y.value; > {code} > Correlation optimizer will determine that a single MR job is enough for this > query. However, the group by key are from both left and right tables of the > right outer join. > We will have a wrong result like > {code} > NULL 4 > NULL val_165 1 > NULL val_193 1 > NULL val_265 1 > NULL val_27 1 > NULL val_409 1 > NULL val_484 1 > NULL 1 > 146 val_146 2 > 150 val_150 1 > 213 val_213 2 > NULL 1 > 238 val_238 2 > 255 val_255 2 > 273 val_273 3 > 278 val_278 2 > 311 val_311 3 > NULL 1 > 401 val_401 5 > 406 val_406 4 > 66 val_66 1 > 98 val_98 2 > {code} > Rows with both x.key and y.value are null may not be grouped. -- This message was sent by Atlassian JIRA (v6.1#6144)