[ 
https://issues.apache.org/jira/browse/HIVE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876809#action_12876809
 ] 

Mayank Lahiri commented on HIVE-1387:
-------------------------------------

The current implementation of percentile() seems to buffer the entire input 
stream as an exact histogram. This is likely to choke on massive datasets, even 
more so with doubles instead of longs where aggregating multiples values as 
counts can have even less of an effect than with doubles.

I'm currently working on a constant-space histogram UDAF. Estimating 
percentiles from such a histogram might be a better option on larger datasets. 
Alternatively, we might consider splitting this functionality into 
percentile_exact() and percentile_approx() UDAFs, using this version and the 
constant-space histogram approximation respectively.

> Make PERCENTILE work with double data type
> ------------------------------------------
>
>                 Key: HIVE-1387
>                 URL: https://issues.apache.org/jira/browse/HIVE-1387
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Vaibhav Aggarwal
>         Attachments: patch-1387-1.patch
>
>
> The PERCENTILE UDAF does not work with double datatype.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to