[
https://issues.apache.org/jira/browse/FLINK-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389797#comment-17389797
]
Gábor Gévay commented on FLINK-2142:
------------------------------------
There is a recent paper on this in the meantime:
http://www.vldb.org/pvldb/vol14/p1818-poepsel-lemaitre.pdf
> GSoC project: Exact and Approximate Statistics for Data Streams and Windows
> ---------------------------------------------------------------------------
>
> Key: FLINK-2142
> URL: https://issues.apache.org/jira/browse/FLINK-2142
> Project: Flink
> Issue Type: New Feature
> Components: API / DataStream
> Reporter: Gábor Gévay
> Assignee: Gábor Gévay
> Priority: Not a Priority
> Labels: gsoc2015, stale-assigned, statistics, streaming
>
> The goal of this project is to implement basic statistics of data streams and
> windows (like average, median, variance, correlation, etc.) in a
> computationally efficient manner. This involves designing custom PreReducers.
> The exact calculation of some statistics (eg. frequencies, or the number of
> distinct elements) would require memory proportional to the number of
> elements in the input (the window or the entire stream). However, there are
> efficient algorithms and data structures using less memory for calculating
> the same statistics only approximately, with user-specified error bounds.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)