Yin Huai created HIVE-5697:
------------------------------

             Summary: Correlation Optimizer may generate wrong plans for cases 
involving outer join
                 Key: HIVE-5697
                 URL: https://issues.apache.org/jira/browse/HIVE-5697
             Project: Hive
          Issue Type: Bug
    Affects Versions: 0.12.0, 0.13.0
            Reporter: Yin Huai
            Assignee: Yin Huai


For example,
{code:sql}
select x.key, y.value, count(*) from src x right outer join src1 y on 
(x.key=y.key and x.value=y.value) group by x.key, y.value; 
{code}
Correlation optimizer will determine that a single MR job is enough for this 
query. However, the group by key are from both left and right tables of the 
right outer join. 

We will have a wrong result like
{code}
NULL            4
NULL    val_165 1
NULL    val_193 1
NULL    val_265 1
NULL    val_27  1
NULL    val_409 1
NULL    val_484 1
NULL            1
146     val_146 2
150     val_150 1
213     val_213 2
NULL            1
238     val_238 2
255     val_255 2
273     val_273 3
278     val_278 2
311     val_311 3
NULL            1
401     val_401 5
406     val_406 4
66      val_66  1
98      val_98  2
{code}
Rows with both x.key and y.value are null may not be grouped.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to