[ 
https://issues.apache.org/jira/browse/HIVE-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-5697:
---------------------------

    Attachment: HIVE-5697.2.patch

added a test query

> Correlation Optimizer may generate wrong plans for cases involving outer join
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-5697
>                 URL: https://issues.apache.org/jira/browse/HIVE-5697
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: 0.12.0, 0.13.0
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>         Attachments: HIVE-5697.1.patch, HIVE-5697.2.patch
>
>
> For example,
> {code:sql}
> select x.key, y.value, count(*) from src x right outer join src1 y on 
> (x.key=y.key and x.value=y.value) group by x.key, y.value; 
> {code}
> Correlation optimizer will determine that a single MR job is enough for this 
> query. However, the group by key are from both left and right tables of the 
> right outer join. 
> We will have a wrong result like
> {code}
> NULL          4
> NULL  val_165 1
> NULL  val_193 1
> NULL  val_265 1
> NULL  val_27  1
> NULL  val_409 1
> NULL  val_484 1
> NULL          1
> 146   val_146 2
> 150   val_150 1
> 213   val_213 2
> NULL          1
> 238   val_238 2
> 255   val_255 2
> 273   val_273 3
> 278   val_278 2
> 311   val_311 3
> NULL          1
> 401   val_401 5
> 406   val_406 4
> 66    val_66  1
> 98    val_98  2
> {code}
> Rows with both x.key and y.value are null may not be grouped.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to