[ 
https://issues.apache.org/jira/browse/HIVE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879894#action_12879894
 ] 

Mayank Lahiri commented on HIVE-1387:
-------------------------------------

This is what I suggest we do to resolve this issue:

1. Create a new percentile_approx() function that overrides 
GenericUDAFHistogramNumeric to approximate a fine-grained histogram with many 
bins (say 10,000 for example, but I'll run some experiments), and then use the 
histogram to estimate the percentile value.

2. Convert the existing simple percentile() UDAF to a generic UDAF. When the 
input is byte, short, int, or long, then use the existing code (with some 
modifications, like converting the linear scan to a binary search). When the 
input is float or double, then automatically use the percentile_approx() 
function. 

> Make PERCENTILE work with double data type
> ------------------------------------------
>
>                 Key: HIVE-1387
>                 URL: https://issues.apache.org/jira/browse/HIVE-1387
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Vaibhav Aggarwal
>            Assignee: Mayank Lahiri
>         Attachments: patch-1387-1.patch
>
>
> The PERCENTILE UDAF does not work with double datatype.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to