[ 
https://issues.apache.org/jira/browse/SPARK-26589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451936#comment-17451936
 ] 

Nicholas Chammas commented on SPARK-26589:
------------------------------------------

Just for reference, Stack Overflow provides evidence that a proper median 
function has been in high demand for some time:
 * [How can I calculate exact median with Apache 
Spark?|https://stackoverflow.com/q/28158729/877069] (14K views)
 * [How to find median and quantiles using 
Spark|https://stackoverflow.com/q/31432843/877069] (117K views)
 * [Median / quantiles within PySpark 
groupBy|https://stackoverflow.com/q/46845672/877069] (67K views)

> proper `median` method for spark dataframe
> ------------------------------------------
>
>                 Key: SPARK-26589
>                 URL: https://issues.apache.org/jira/browse/SPARK-26589
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Jan Gorecki
>            Priority: Minor
>
> I found multiple tickets asking for median function to be implemented in 
> Spark. Most of those tickets links to "SPARK-6761 Approximate quantile" as 
> duplicate of it. The thing is that approximate quantile is a workaround for 
> lack of median function. Thus I am filling this Feature Request for proper, 
> exact, not approximation of, median function. I am aware about difficulties 
> that are caused by distributed environment when trying to compute median, 
> nevertheless I don't think those difficulties is reason good enough to drop 
> out `median` function from scope of Spark. I am not asking about efficient 
> median but exact median.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to