[ 
https://issues.apache.org/jira/browse/FLINK-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284446#comment-15284446
 ] 

Stavros Kontopoulos commented on FLINK-2147:
--------------------------------------------

Ok so if there multiple windows evaluated at different times in parallel since 
data comes out of order, what kind of statistic is computable in this model? 
What are the correct semantics here? Emit a statistic update only when ordering 
is reconstructed (appropriate windows are calculated) and delay future results? 
What about count min?

> Approximate calculation of frequencies in data streams
> ------------------------------------------------------
>
>                 Key: FLINK-2147
>                 URL: https://issues.apache.org/jira/browse/FLINK-2147
>             Project: Flink
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: Gabor Gevay
>              Labels: approximate, statistics
>
> Count-Min sketch is a hashing-based algorithm for approximately keeping track 
> of the frequencies of elements in a data stream. It is described by Cormode 
> et al. in the following paper:
> http://dimacs.rutgers.edu/~graham/pubs/papers/cmsoft.pdf
> Note that this algorithm can be conveniently implemented in a distributed 
> way, as described in section 3.2 of the paper.
> The paper
> http://www.vldb.org/conf/2002/S10P03.pdf
> also describes algorithms for approximately keeping track of frequencies, but 
> here the user can specify a threshold below which she is not interested in 
> the frequency of an element. The error-bounds are also different than the 
> Count-min sketch algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to