[ 
https://issues.apache.org/jira/browse/SPARK-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945309#comment-14945309
 ] 

Xiangrui Meng commented on SPARK-10641:
---------------------------------------

If we want to implement the numerically stable version. We should refactor the 
StdDevAgg implementation to add moving third and fourth moments. Then the 
StdDevAgg should be renamed to CentralMomentAgg.

In the future, we need to make sure that codegen doesn't include unnecessary 
branches if kurtosis and skewness are not asked by the user.

Btw, there will be some space for optimization, e.g.

{code}
df.groupBy("key").agg(skewness("a"), kurtosis("a"))
{code}

will have duplicate computation.

> skewness and kurtosis support
> -----------------------------
>
>                 Key: SPARK-10641
>                 URL: https://issues.apache.org/jira/browse/SPARK-10641
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, SQL
>            Reporter: Jihong MA
>            Assignee: Seth Hendrickson
>
> Implementing skewness and kurtosis support based on following algorithm:
> https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Higher-order_statistics



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to