While Impala does not have percentile_disc() It is possible to use
cume_dist() to calculate it like such.
That JIRA is https://issues.apache.org/jira/browse/IMPALA-4025

Here is a Postgres example but the latter works on Impala.

select
  percentile_disc(0.95) within group (order by id) as ptile95,
  percentile_disc(0.99) within group (order by id) as ptile99
from generate_series(1,100) as t(id);

 ptile95 | ptile99
---------+---------
      95 |      99
(1 row)

with t1 as(
  select
    id,
    cume_dist() over (order by id) as cd
  from generate_series(1,100) as t(id)
)
select
  min(case when cd >= 0.95 then id end) as ptile95,
  min(case when cd >= 0.99 then id end) as ptile99
from t1;
 ptile95 | ptile99
---------+---------
      95 |      99
(1 row)


On Tue, Aug 29, 2017 at 11:17 AM, Lars Volker <[email protected]> wrote:

> Impala currently doesn't have percentile functions. Work on those is
> tracked in IMPALA-3602 and contributions are always welcome. You might be
> able to get what you need using analytic functions, e.g. NTILE:
> https://www.cloudera.com/documentation/enterprise/
> latest/topics/impala_analytic_functions.html#ntile
>
> On Tue, Aug 29, 2017 at 8:39 AM, Gayathri Devi <[email protected]
> >
> wrote:
>
> > Hi,
> >
> > I have a table on hbase i want to query on impala to calculate 95 th and
> 99
> > th percentile? any built in function available or user defined function?
> >
>

Reply via email to