[
https://issues.apache.org/jira/browse/FLINK-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208869#comment-15208869
]
Todd Lisonbee commented on FLINK-3613:
--------------------------------------
I created another related JIRA FLINK-3664 with a design for a summarize()
function.
I think FLINK-3664 would be a better place for me to start than improving the
existing aggregations.
> Add standard deviation, mean, variance to list of Aggregations
> --------------------------------------------------------------
>
> Key: FLINK-3613
> URL: https://issues.apache.org/jira/browse/FLINK-3613
> Project: Flink
> Issue Type: Improvement
> Reporter: Todd Lisonbee
> Priority: Minor
> Attachments: DataSet-Aggregation-Design-March2016-v1.txt
>
>
> Implement standard deviation, mean, variance for
> org.apache.flink.api.java.aggregation.Aggregations
> Ideally implementation should be single pass and numerically stable.
> References:
> "Scalable and Numerically Stable Descriptive Statistics in SystemML", Tian et
> al, International Conference on Data Engineering 2012
> http://dl.acm.org/citation.cfm?id=2310392
> "The Kahan summation algorithm (also known as compensated summation) reduces
> the numerical errors that occur when adding a sequence of finite precision
> floating point numbers. Numerical errors arise due to truncation and
> rounding. These errors can lead to numerical instability when calculating
> variance."
> https://en.wikipedia.org/wiki/Kahan_summation_algorithm
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)