[
https://issues.apache.org/jira/browse/HIVE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876809#action_12876809
]
Mayank Lahiri commented on HIVE-1387:
-------------------------------------
The current implementation of percentile() seems to buffer the entire input
stream as an exact histogram. This is likely to choke on massive datasets, even
more so with doubles instead of longs where aggregating multiples values as
counts can have even less of an effect than with doubles.
I'm currently working on a constant-space histogram UDAF. Estimating
percentiles from such a histogram might be a better option on larger datasets.
Alternatively, we might consider splitting this functionality into
percentile_exact() and percentile_approx() UDAFs, using this version and the
constant-space histogram approximation respectively.
> Make PERCENTILE work with double data type
> ------------------------------------------
>
> Key: HIVE-1387
> URL: https://issues.apache.org/jira/browse/HIVE-1387
> Project: Hadoop Hive
> Issue Type: Improvement
> Reporter: Vaibhav Aggarwal
> Attachments: patch-1387-1.patch
>
>
> The PERCENTILE UDAF does not work with double datatype.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.