[
https://issues.apache.org/jira/browse/HIVE-17390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141708#comment-16141708
]
Khaja Hussain commented on HIVE-17390:
--------------------------------------
Thanks Brian for filing the bug.
> Select count(distinct) returns incorrect results using tez
> ----------------------------------------------------------
>
> Key: HIVE-17390
> URL: https://issues.apache.org/jira/browse/HIVE-17390
> Project: Hive
> Issue Type: Bug
> Components: Query Planning
> Affects Versions: 1.2.1
> Reporter: Brian Goerlitz
>
> With the following combination of settings, select count(distinct) will
> return the results of select sum(distinct).
> hive.execution.engine=tez
> hive.optimize.reducededuplication=true
> hive.optimize.reducededuplication.min.reducer=1
> hive.optimize.distinct.rewrite=true
> hive.groupby.skewindata=false
> hive.vectorized.execution.reduce.enabled=true
> STEPS TO REPRODUCE:
> {quote}CREATE TABLE `simple_data`(ppmonth int, sale double);
> INSERT INTO simple_data VALUES
> (501,25000.0),(502,60000.0),(501,40000.0),(502,70000.0),(501,35000.0),(502,60000.0);
> set hive.execution.engine=tez;
> set hive.optimize.reducededuplication=true;
> set hive.optimize.reducededuplication.min.reducer=1;
> set hive.optimize.distinct.rewrite=true;
> set hive.groupby.skewindata=false;
> set hive.vectorized.execution.reduce.enabled=true;
> select count(distinct ppmonth) from simple_data;{quote}
> Returns 1003 rather than 2
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)