[ 
https://issues.apache.org/jira/browse/HIVE-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809848#comment-13809848
 ] 

Sergey Shelukhin commented on HIVE-5657:
----------------------------------------

Oh, I see how it works now. Logically, is it the same behavior as groupby, but 
groupby forwards all MR KVs except one, which is later flushed if it "survives" 
in the heap, or maybe evicted. Whereas this guy would have to store multiple 
key-values for each key in the case of multi distinct, so it just forwards 
everything in case it's in the heap, not storing that one row. 
That part makes sense. Thanks! Function name ("index") could be improved.
Are you going to post updated patch?

> TopN produces incorrect results with count(distinct)
> ----------------------------------------------------
>
>                 Key: HIVE-5657
>                 URL: https://issues.apache.org/jira/browse/HIVE-5657
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Navis
>            Priority: Critical
>         Attachments: D13797.1.patch, example.patch, HIVE-5657.1.patch.txt
>
>
> Attached patch illustrates the problem.
> limit_pushdown test has various other cases of aggregations and distincts, 
> incl. count-distinct, that work correctly (that said, src dataset is bad for 
> testing these things because every count, for example, produces one record 
> only), so something must be special about this.
> I am not very familiar with distinct- code and these nuances; if someone 
> knows a quick fix feel free to take this, otherwise I will probably start 
> looking next week. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to