Hey,
I don't know about writing a new UDAF for percentile, but I believe we
already have some similar support using Apache Datasketches in Impala.
This is how you can use it:
1) First create a sketch from a column: ds_kll_sketch(col_name)
2) Then use this sketch to get approximated quantiles:
ds_kll_quantile(sketch, 0.1)
E.g. SELECT ds_kll_quantile(ds_kll_sketch(col_name), 0.1) FROM tbl;

Note, you can also store the sketch from 1) into a string column of a table
so that you don't have to sketch the whole column for every quantile
calculation.

Let me know if this helps!
Gabor


On Tue, Oct 18, 2022 at 7:30 PM Mao Hangjun <maohang...@gmail.com> wrote:

> Hi , impala dev!
>
>   Can you help me to implement approx_percentile.
>
> Impala UDAF:
> https://github.com/cloudera/impala-udf-samples
>
> Like:
> https://impala.apache.org/docs/build/html/topics/impala_appx_median.html
>
> SQL Like:
> select approx_percentile(c1, 0.1) from test_table;
>
> Python Like:
> import numpy as np
>
> a = np.array([1,2,3,4,5,6,7,8,9,10])
>
> print(np.median(a))
> print(np.percentile(a,10))
> print(np.percentile(a,20))
> print(np.percentile(a,30))
> print(np.percentile(a,40))
> print(np.percentile(a,50))
> print(np.percentile(a,60))
> print(np.percentile(a,70))
> print(np.percentile(a,80))
> print(np.percentile(a,90))
>
> Thank you very much.
>

Reply via email to