[ 
https://issues.apache.org/jira/browse/SPARK-53886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18052493#comment-18052493
 ] 

Christopher Boumalhab commented on SPARK-53886:
-----------------------------------------------

Hi [~Gengliang.Wang], [~dtenedor] and I worked on these already, sorry I forgot 
to update this.

> Percentile estimation functions
> -------------------------------
>
>                 Key: SPARK-53886
>                 URL: https://issues.apache.org/jira/browse/SPARK-53886
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 0.5.0
>            Reporter: Gengliang Wang
>            Assignee: Christopher Boumalhab
>            Priority: Major
>
> Similar to https://issues.apache.org/jira/browse/SPARK-53885, it would be 
> useful to have the following percentile estimation functions based on data 
> sketch. These functions provide a lightweight, mergeable representation of 
> percentile distributions, enabling efficient approximate percentile 
> computation over large or streaming datasets without maintaining full data 
> samples.
>  * *APPROX_PERCENTILE_ACCUMULATE(expr[, maxItemsTracked]):* Creates an 
> intermediate state object that incrementally accumulates values for 
> percentile estimation using a sketch-based algorithm.
>  * *APPROX_PERCENTILE_ESTIMATE(state [, k] ])* : Returns the approximate 
> percentile value(s) from a previously accumulated sketch state.
>  * *APPROX_PERCENTILE_COMBINE(expr[, maxItemsTracked])* : Merges two 
> intermediate sketch states to enable distributed or parallel percentile 
> estimation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to