[
https://issues.apache.org/jira/browse/IMPALA-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169025#comment-17169025
]
ASF subversion and git services commented on IMPALA-9959:
---------------------------------------------------------
Commit 033a4607e2c9cd5a107a3af01f3fb3490bc5bc6e in impala's branch
refs/heads/master from Gabor Kaszab
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=033a460 ]
IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() functions
ds_kll_sketch() is an aggregate function that receives a float
parameter (e.g. a float column of a table) and returns a serialized
Apache DataSketches KLL sketch of the input data set wrapped into
STRING type. This sketch can be saved into a table or view and later
used for quantile approximations. ds_kll_quantile() receives two
parameters: a STRING parameter that contains a serialized KLL sketch
and a DOUBLE that represents the rank of the quantile in the range of
[0,1]. E.g. rank=0.1 means the approximate value in the sketch where
10% of the sketched items are less than or equals to this value.
Testing:
- Added automated tests on small data sets to check the basic
functionality of sketching and getting a quantile approximate.
- Tested on TPCH25_parquet.lineitem to check that sketching and
approximating works on bigger scale as well where serialize/merge
phases are also required. On this scale the error range of the
quantile approximation is within 1-1.5%
Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52
Reviewed-on: http://gerrit.cloudera.org:8080/16235
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Implement ds_kll_sketch() and ds_kll_quantile() functions
> ---------------------------------------------------------
>
> Key: IMPALA-9959
> URL: https://issues.apache.org/jira/browse/IMPALA-9959
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend, Frontend
> Reporter: Gabor Kaszab
> Assignee: Gabor Kaszab
> Priority: Major
>
> 1:
> STRING ds_kll_sketch(float)
> Accepts float as parameter and returns a DataSketches KLL sketch in string
> type (or binary once that work is submitted).
> 2:
> FLOAT (or DOUBLE?) ds_kll_quantile(KLL sketch, double)
> Accepts two parameters: a KLL sketch created by ds_hll_sketch() and a double
> in [0, 1] to represent the quantile.
> At this point I'm not sure about the return value, it's either a float or
> double, it's a subject of further investigation.
> Example:
> {code:java}
> select ds_kll_quantile(ds_kll_sketch(cast(int_col as float)), 1) from
> table_name;
> +------+
> | _c0 |
> +------+
> | 1.0 |
> +------+
> {code}
> Some further examples found here:
> [https://datasketches.apache.org/docs/Quantiles/QuantilesCppExample.html]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]