Yin Huai created HIVE-5697: ------------------------------ Summary: Correlation Optimizer may generate wrong plans for cases involving outer join Key: HIVE-5697 URL: https://issues.apache.org/jira/browse/HIVE-5697 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.0 Reporter: Yin Huai Assignee: Yin Huai
For example, {code:sql} select x.key, y.value, count(*) from src x right outer join src1 y on (x.key=y.key and x.value=y.value) group by x.key, y.value; {code} Correlation optimizer will determine that a single MR job is enough for this query. However, the group by key are from both left and right tables of the right outer join. We will have a wrong result like {code} NULL 4 NULL val_165 1 NULL val_193 1 NULL val_265 1 NULL val_27 1 NULL val_409 1 NULL val_484 1 NULL 1 146 val_146 2 150 val_150 1 213 val_213 2 NULL 1 238 val_238 2 255 val_255 2 273 val_273 3 278 val_278 2 311 val_311 3 NULL 1 401 val_401 5 406 val_406 4 66 val_66 1 98 val_98 2 {code} Rows with both x.key and y.value are null may not be grouped. -- This message was sent by Atlassian JIRA (v6.1#6144)