[ 
https://issues.apache.org/jira/browse/SPARK-53991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18032269#comment-18032269
 ] 

Daniel Tenedorio commented on SPARK-53991:
------------------------------------------

We can use the following function names. For each category, we can support long 
integer, single-precision floating-point, and double-precision floating-point 
variants (it is necessary to specify which one we are using since the 
representation of the sketch buffers is different for different input value 
data types).

 

Aggregate functions to consume input values and return a sketch buffer:

{{{}kll_sketch_agg_bigint(col){}}}{{{}{}}}

{{kll_sketch_agg_float(col)}}

{{{}kll_sketch_agg_double(col){}}}{{{}{}}}

 

Scalar functions to merge two sketch buffers together into another sketch 
buffer:

{{{}kll_sketch_merge_bigint(sketch1, sketch2){}}}{{{}{}}}

{{kll_sketch_merge_float(sketch1, sketch2)}}

{{{}kll_sketch_merge_double(sketch1, sketch2){}}}{{{}{}}}

 

Scalar functions to extract a single value from the quantiles sketch 
representing the desired quantile given the input rank (for example, "float 
median = sketch.getQuantile(0.5)"). We can also implement the functions in this 
category to also support accepting an array of input ranks and return an array 
of result quantiles.

{{{}kll_sketch_get_quantile_bigint(sketch, rank){}}}{{{}{}}}

{{kll_sketch_get_quantile_float(sketch, rank)}}

{{{}kll_sketch_get_quantile_double(sketch, rank){}}}{{{}{}}}

 

Scalar functions to extract a single value from the quantiles sketch 
representing the desired rank given the input quantile (for example, "double 
rankOf1000 = sketch.getRank(1000)"). We can also implement the functions in 
this category to also support accepting an array of input quantiles and return 
an array of result ranks.

{{{}kll_sketch_get_rank_bigint(sketch, quantile){}}}{{{}{}}}

{{kll_sketch_get_rank_float(sketch, quantile)}}

{{{}kll_sketch_get_rank_double(sketch, quantile){}}}{{{}{}}}

 

Optional, scalar functions to return a string representation of a sketch buffer:

{{{}kll_sketch_to_string_bigint(sketch){}}}{{{}{}}}

{{kll_sketch_to_string_float(sketch)}}

{{{}kll_sketch_to_string_double(sketch){}}}{{{}{}}}

> Add support for KLL quantiles functions based on DataSketches
> -------------------------------------------------------------
>
>                 Key: SPARK-53991
>                 URL: https://issues.apache.org/jira/browse/SPARK-53991
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 4.1
>            Reporter: Daniel Tenedorio
>            Assignee: Daniel Tenedorio
>            Priority: Major
>
> Documentation reference: 
> [https://datasketches.apache.org/docs/KLL/KLLSketch.html].
> Reference PR for recently adding Theta sketches: 
> https://github.com/apache/spark/pull/51298



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to