[jira] [Updated] (HIVE-7177) percentile_approx very inaccurate with high multiplicities in the data

Tom Temple (JIRA) Mon, 09 Jun 2014 14:05:30 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tom Temple updated HIVE-7177:
-----------------------------

    Environment: Redhat 5.10 running Cloudera 5.0.1  (was: Redhat 5.10 running 
Cloudera 5.0.0)

> percentile_approx very inaccurate with high multiplicities in the data
> ----------------------------------------------------------------------
>
>                 Key: HIVE-7177
>                 URL: https://issues.apache.org/jira/browse/HIVE-7177
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 0.12.0
>         Environment: Redhat 5.10 running Cloudera 5.0.1
>            Reporter: Tom Temple
>
> To reproduce:
> 1) create a table with a single integer column
> 2) with values: 1 million, 2 million, 3 million, and 4 million each repeated 
> a quarter million times.
> 3) percentile_approx(cast(col_0 as double), array(0.33,0.34),1000000)
> Expected results: [2000000.0,2000000.0]
> Actual results: [1280000.0,1320000.0] (I might be off by 40000 here)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7177) percentile_approx very inaccurate with high multiplicities in the data

Reply via email to