While Impala does not have percentile_disc() It is possible to use
cume_dist() to calculate it like such.
That JIRA is https://issues.apache.org/jira/browse/IMPALA-4025
Here is a Postgres example but the latter works on Impala.
select
percentile_disc(0.95) within group (order by id) as ptile95,
percentile_disc(0.99) within group (order by id) as ptile99
from generate_series(1,100) as t(id);
ptile95 | ptile99
---------+---------
95 | 99
(1 row)
with t1 as(
select
id,
cume_dist() over (order by id) as cd
from generate_series(1,100) as t(id)
)
select
min(case when cd >= 0.95 then id end) as ptile95,
min(case when cd >= 0.99 then id end) as ptile99
from t1;
ptile95 | ptile99
---------+---------
95 | 99
(1 row)
On Tue, Aug 29, 2017 at 11:17 AM, Lars Volker <[email protected]> wrote:
> Impala currently doesn't have percentile functions. Work on those is
> tracked in IMPALA-3602 and contributions are always welcome. You might be
> able to get what you need using analytic functions, e.g. NTILE:
> https://www.cloudera.com/documentation/enterprise/
> latest/topics/impala_analytic_functions.html#ntile
>
> On Tue, Aug 29, 2017 at 8:39 AM, Gayathri Devi <[email protected]
> >
> wrote:
>
> > Hi,
> >
> > I have a table on hbase i want to query on impala to calculate 95 th and
> 99
> > th percentile? any built in function available or user defined function?
> >
>