[ 
https://issues.apache.org/jira/browse/IMPALA-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169025#comment-17169025
 ] 

ASF subversion and git services commented on IMPALA-9959:
---------------------------------------------------------

Commit 033a4607e2c9cd5a107a3af01f3fb3490bc5bc6e in impala's branch 
refs/heads/master from Gabor Kaszab
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=033a460 ]

IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() functions

ds_kll_sketch() is an aggregate function that receives a float
parameter (e.g. a float column of a table) and returns a serialized
Apache DataSketches KLL sketch of the input data set wrapped into
STRING type. This sketch can be saved into a table or view and later
used for quantile approximations. ds_kll_quantile() receives two
parameters: a STRING parameter that contains a serialized KLL sketch
and a DOUBLE that represents the rank of the quantile in the range of
[0,1]. E.g. rank=0.1 means the approximate value in the sketch where
10% of the sketched items are less than or equals to this value.

Testing:
  - Added automated tests on small data sets to check the basic
    functionality of sketching and getting a quantile approximate.
  - Tested on TPCH25_parquet.lineitem to check that sketching and
    approximating works on bigger scale as well where serialize/merge
    phases are also required. On this scale the error range of the
    quantile approximation is within 1-1.5%

Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52
Reviewed-on: http://gerrit.cloudera.org:8080/16235
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Implement ds_kll_sketch() and ds_kll_quantile() functions
> ---------------------------------------------------------
>
>                 Key: IMPALA-9959
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9959
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend, Frontend
>            Reporter: Gabor Kaszab
>            Assignee: Gabor Kaszab
>            Priority: Major
>
> 1:
>  STRING ds_kll_sketch(float)
>  Accepts float as parameter and returns a DataSketches KLL sketch in string 
> type (or binary once that work is submitted).
> 2:
>  FLOAT (or DOUBLE?) ds_kll_quantile(KLL sketch, double)
>  Accepts two parameters: a KLL sketch created by ds_hll_sketch() and a double 
> in [0, 1] to represent the quantile.
> At this point I'm not sure about the return value, it's either a float or 
> double, it's a subject of further investigation.
> Example:
> {code:java}
> select ds_kll_quantile(ds_kll_sketch(cast(int_col as float)), 1) from 
> table_name;
> +------+
> | _c0  |
> +------+
> | 1.0  |
> +------+
> {code}
> Some further examples found here:
>  [https://datasketches.apache.org/docs/Quantiles/QuantilesCppExample.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to