[
https://issues.apache.org/jira/browse/SPARK-53991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18032269#comment-18032269
]
Daniel Tenedorio edited comment on SPARK-53991 at 11/3/25 10:04 PM:
--------------------------------------------------------------------
We can use the following function names. For each category, we can support long
integer, single-precision floating-point, and double-precision floating-point
variants (it is necessary to specify which one we are using since the
representation of the sketch buffers is different for different input value
data types).
Aggregate functions to consume input values and return a sketch buffer:
{{kll_sketch_agg_bigint(col)}}
{{kll_sketch_agg_float(col)}}
{{kll_sketch_agg_double(col)}}
Scalar functions to merge two sketch buffers together into another sketch
buffer:
{{kll_sketch_merge_bigint(sketch1, sketch2)}}
{{kll_sketch_merge_float(sketch1, sketch2)}}
{{kll_sketch_merge_double(sketch1, sketch2)}}
Scalar functions to extract a single value from the quantiles sketch
representing the desired quantile given the input rank (for example, "float
median = sketch.getQuantile(0.5)"). We can also implement the functions in this
category to also support accepting an array of input ranks and return an array
of result quantiles.
{{kll_sketch_get_quantile_bigint(sketch, rank)}}
{{kll_sketch_get_quantile_float(sketch, rank)}}
{{kll_sketch_get_quantile_double(sketch, rank)}}
Scalar functions to extract a single value from the quantiles sketch
representing the desired rank given the input quantile (for example, "double
rankOf1000 = sketch.getRank(1000)"). We can also implement the functions in
this category to also support accepting an array of input quantiles and return
an array of result ranks.
{{kll_sketch_get_rank_bigint(sketch, quantile)}}
{{kll_sketch_get_rank_float(sketch, quantile)}}
{{kll_sketch_get_rank_double(sketch, quantile)}}
Scalar functions to count the number of items collected in the sketch so far.
{{kll_sketch_get_n_bigint(sketch)}}
{{kll_sketch_get_n_float(sketch)}}
{{kll_sketch_get_n_double(sketch)}}
Optional, scalar functions to return a string representation of a sketch buffer:
{{kll_sketch_to_string_bigint(sketch)}}
{{kll_sketch_to_string_float(sketch)}}
{{kll_sketch_to_string_double(sketch)}}
was (Author: JIRAUSER285772):
We can use the following function names. For each category, we can support long
integer, single-precision floating-point, and double-precision floating-point
variants (it is necessary to specify which one we are using since the
representation of the sketch buffers is different for different input value
data types).
Aggregate functions to consume input values and return a sketch buffer:
{{kll_sketch_agg_bigint(col)}}
{{kll_sketch_agg_float(col)}}
{{kll_sketch_agg_double(col)}}
Scalar functions to merge two sketch buffers together into another sketch
buffer:
{{kll_sketch_merge_bigint(sketch1, sketch2)}}
{{kll_sketch_merge_float(sketch1, sketch2)}}
{{kll_sketch_merge_double(sketch1, sketch2)}}
Scalar functions to extract a single value from the quantiles sketch
representing the desired quantile given the input rank (for example, "float
median = sketch.getQuantile(0.5)"). We can also implement the functions in this
category to also support accepting an array of input ranks and return an array
of result quantiles.
{{kll_sketch_get_quantile_bigint(sketch, rank)}}
{{kll_sketch_get_quantile_float(sketch, rank)}}
{{kll_sketch_get_quantile_double(sketch, rank)}}
Scalar functions to extract a single value from the quantiles sketch
representing the desired rank given the input quantile (for example, "double
rankOf1000 = sketch.getRank(1000)"). We can also implement the functions in
this category to also support accepting an array of input quantiles and return
an array of result ranks.
{{kll_sketch_get_rank_bigint(sketch, quantile)}}
{{kll_sketch_get_rank_float(sketch, quantile)}}
{{kll_sketch_get_rank_double(sketch, quantile)}}
Scalar functions to count the number of items collected in the sketch so far.
{{kll_sketch_get_n_bigint(sketch)}}
{{kll_sketch_get_n_float(sketch)}}
{{kll_sketch_get_n_double(sketch)}}
Optional, scalar functions to return a string representation of a sketch buffer:
{{kll_sketch_to_string_bigint(sketch)}}
{{kll_sketch_to_string_float(sketch)}}
{{kll_sketch_to_string_double(sketch)}}
> Add support for KLL quantiles functions based on DataSketches
> -------------------------------------------------------------
>
> Key: SPARK-53991
> URL: https://issues.apache.org/jira/browse/SPARK-53991
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 4.1.0
> Reporter: Daniel Tenedorio
> Assignee: Daniel Tenedorio
> Priority: Major
> Labels: pull-request-available
>
> Documentation reference:
> [https://datasketches.apache.org/docs/KLL/KLLSketch.html].
> DataSketches code API reference:
> [https://apache.github.io/datasketches-java/6.1.0/org/apache/datasketches/kll/KllLongsSketch.html]
>
> Reference PR for recently adding Theta sketches:
> [https://github.com/apache/spark/pull/51298]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]