[
https://issues.apache.org/jira/browse/HIVE-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Phabricator updated HIVE-5657:
------------------------------
Attachment: D13797.1.patch
navis requested code review of "HIVE-5657 [jira] TopN produces incorrect
results with count(distinct)".
Reviewers: JIRA
HIVE-5657 TopN produces incorrect results with count(distinct)
Attached patch illustrates the problem.
limit_pushdown test has various other cases of aggregations and distincts,
incl. count-distinct, that work correctly (that said, src dataset is bad for
testing these things because every count, for example, produces one record
only), so something must be special about this.
I am not very familiar with distinct- code and these nuances; if someone knows
a quick fix feel free to take this, otherwise I will probably start looking
next week.
TEST PLAN
EMPTY
REVISION DETAIL
https://reviews.facebook.net/D13797
AFFECTED FILES
ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorReduceSinkOperator.java
ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java
ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
ql/src/test/queries/clientpositive/limit_pushdown.q
ql/src/test/queries/clientpositive/limit_pushdown_negative.q
ql/src/test/results/clientpositive/limit_pushdown.q.out
ql/src/test/results/clientpositive/limit_pushdown_negative.q.out
serde/src/java/org/apache/hadoop/hive/serde2/KeySerializer.java
serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java
serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/OutputByteBuffer.java
WHY DID I GET THIS EMAIL?
https://reviews.facebook.net/herald/transcript/41811/
To: JIRA, navis
> TopN produces incorrect results with count(distinct)
> ----------------------------------------------------
>
> Key: HIVE-5657
> URL: https://issues.apache.org/jira/browse/HIVE-5657
> Project: Hive
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Assignee: Navis
> Priority: Critical
> Attachments: D13797.1.patch, example.patch, HIVE-5657.1.patch.txt
>
>
> Attached patch illustrates the problem.
> limit_pushdown test has various other cases of aggregations and distincts,
> incl. count-distinct, that work correctly (that said, src dataset is bad for
> testing these things because every count, for example, produces one record
> only), so something must be special about this.
> I am not very familiar with distinct- code and these nuances; if someone
> knows a quick fix feel free to take this, otherwise I will probably start
> looking next week.
--
This message was sent by Atlassian JIRA
(v6.1#6144)