[
https://issues.apache.org/jira/browse/IMPALA-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabor Kaszab closed IMPALA-10020.
---------------------------------
Fix Version/s: Impala 4.0
Resolution: Fixed
> Implement ds_kll_cdf() function
> -------------------------------
>
> Key: IMPALA-10020
> URL: https://issues.apache.org/jira/browse/IMPALA-10020
> Project: IMPALA
> Issue Type: New Feature
> Components: Backend, Frontend
> Reporter: Gabor Kaszab
> Assignee: Gabor Kaszab
> Priority: Major
> Fix For: Impala 4.0
>
>
> Requirements for ds_kll_cdf() (Cumulative Distribution Function):
> - Receives a serialized KLL sketch in BINARY type (in Impala it can be
> STRING as long as we don't have BINARY) as first parameter.
> - Receives one or more float values to create ranges from the sketched data.
> - In Hive the return type is an array of doubles. However, Impala can't
> return complex types from functions at this point so we have to find some
> alternative approaches to implement this function. Follow whatever solution
> came up inĀ https://issues.apache.org/jira/browse/IMPALA-9962
> An example:
> {code:java}
> select ds_kll_cdf(sketch_col, 1, 2, 3, 4) from sketches_table;
> {code}
> This will generate the following ranges: (-inf, 1), (-inf,2), (-inf,3),
> (-inf,4), (-inf,+inf)
> In Hive, the result would have an array of 5 doubles for the 5 ranges, where
> each number gives the probability between [0,1] that an item will fall into
> the particular range. Or in other words a ratio of items belonging to that
> range.
> Taking input values such as: 1,2,3,4,5
> {code:java}
> select ds_kll_cdf(f, 1, 3, 4, 5, 10) from kll_sketches;
> +----------------------------+
> | _c0 |
> +----------------------------+
> | [0.0,0.4,0.6,0.8,1.0,1.0] |
> +----------------------------+
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]