[jira] [Updated] (HIVE-10607) Combination of ReducesinkDedup + TopN optimization yields incorrect result if there are multiple GBY in reducer

Ashutosh Chauhan (JIRA) Mon, 04 May 2015 22:31:09 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ashutosh Chauhan updated HIVE-10607:
------------------------------------
    Attachment: HIVE-10607.patch

Turns out problem is independent of Tez vs MR. In MR also, one can have reducer 
with multiple RS in pipeline.
Conservative patch which detects this case and turns optimization off in such 
cases. 

> Combination of ReducesinkDedup + TopN optimization yields incorrect result if 
> there are multiple GBY in reducer
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-10607
>                 URL: https://issues.apache.org/jira/browse/HIVE-10607
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer, Tez
>    Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.1.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>         Attachments: HIVE-10607.patch
>
>
> {code:sql}
> select ctinyint, count(cdouble) from (select ctinyint, cdouble from 
> alltypesorc group by ctinyint, cdouble) t1 group by ctinyint order by 
> ctinyint limit 20;
> {code}
> This gives different result set depending on which set of optimizations are 
> on. In particular in .q test environment following two invocations will give 
> you different result set:
> {code}
> *   mvn test -Phadoop-2 -Dtest.output.overwrite=true 
> -Dtest=TestMiniTezCliDriver -Dqfile=test.q 
> -Dhive.optimize.reducededuplication.min.reducer=1 
> -Dhive.limit.pushdown.memory.usage=0.3f
> *   mvn test -Phadoop-2 -Dtest.output.overwrite=true 
> -Dtest=TestMiniTezCliDriver -Dqfile=test.q 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10607) Combination of ReducesinkDedup + TopN optimization yields incorrect result if there are multiple GBY in reducer

Reply via email to