Hey, I don't know about writing a new UDAF for percentile, but I believe we already have some similar support using Apache Datasketches in Impala. This is how you can use it: 1) First create a sketch from a column: ds_kll_sketch(col_name) 2) Then use this sketch to get approximated quantiles: ds_kll_quantile(sketch, 0.1) E.g. SELECT ds_kll_quantile(ds_kll_sketch(col_name), 0.1) FROM tbl;
Note, you can also store the sketch from 1) into a string column of a table so that you don't have to sketch the whole column for every quantile calculation. Let me know if this helps! Gabor On Tue, Oct 18, 2022 at 7:30 PM Mao Hangjun <maohang...@gmail.com> wrote: > Hi , impala dev! > > Can you help me to implement approx_percentile. > > Impala UDAF: > https://github.com/cloudera/impala-udf-samples > > Like: > https://impala.apache.org/docs/build/html/topics/impala_appx_median.html > > SQL Like: > select approx_percentile(c1, 0.1) from test_table; > > Python Like: > import numpy as np > > a = np.array([1,2,3,4,5,6,7,8,9,10]) > > print(np.median(a)) > print(np.percentile(a,10)) > print(np.percentile(a,20)) > print(np.percentile(a,30)) > print(np.percentile(a,40)) > print(np.percentile(a,50)) > print(np.percentile(a,60)) > print(np.percentile(a,70)) > print(np.percentile(a,80)) > print(np.percentile(a,90)) > > Thank you very much. >