[ https://issues.apache.org/jira/browse/FLINK-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284474#comment-15284474 ]
Gabor Gevay commented on FLINK-2142: ------------------------------------ (I've also broken off FLINK-2144.) > GSoC project: Exact and Approximate Statistics for Data Streams and Windows > --------------------------------------------------------------------------- > > Key: FLINK-2142 > URL: https://issues.apache.org/jira/browse/FLINK-2142 > Project: Flink > Issue Type: New Feature > Components: Streaming > Reporter: Gabor Gevay > Assignee: Gabor Gevay > Priority: Minor > Labels: gsoc2015, statistics, streaming > > The goal of this project is to implement basic statistics of data streams and > windows (like average, median, variance, correlation, etc.) in a > computationally efficient manner. This involves designing custom PreReducers. > The exact calculation of some statistics (eg. frequencies, or the number of > distinct elements) would require memory proportional to the number of > elements in the input (the window or the entire stream). However, there are > efficient algorithms and data structures using less memory for calculating > the same statistics only approximately, with user-specified error bounds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)