[
https://issues.apache.org/jira/browse/HIVE-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashutosh Chauhan updated HIVE-10607:
------------------------------------
Attachment: HIVE-10607.patch
Turns out problem is independent of Tez vs MR. In MR also, one can have reducer
with multiple RS in pipeline.
Conservative patch which detects this case and turns optimization off in such
cases.
> Combination of ReducesinkDedup + TopN optimization yields incorrect result if
> there are multiple GBY in reducer
> ---------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-10607
> URL: https://issues.apache.org/jira/browse/HIVE-10607
> Project: Hive
> Issue Type: Bug
> Components: Logical Optimizer, Tez
> Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.1.0
> Reporter: Ashutosh Chauhan
> Assignee: Ashutosh Chauhan
> Attachments: HIVE-10607.patch
>
>
> {code:sql}
> select ctinyint, count(cdouble) from (select ctinyint, cdouble from
> alltypesorc group by ctinyint, cdouble) t1 group by ctinyint order by
> ctinyint limit 20;
> {code}
> This gives different result set depending on which set of optimizations are
> on. In particular in .q test environment following two invocations will give
> you different result set:
> {code}
> * mvn test -Phadoop-2 -Dtest.output.overwrite=true
> -Dtest=TestMiniTezCliDriver -Dqfile=test.q
> -Dhive.optimize.reducededuplication.min.reducer=1
> -Dhive.limit.pushdown.memory.usage=0.3f
> * mvn test -Phadoop-2 -Dtest.output.overwrite=true
> -Dtest=TestMiniTezCliDriver -Dqfile=test.q
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)