[ https://issues.apache.org/jira/browse/FLINK-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253383#comment-15253383 ]
Austin Ouyang commented on FLINK-1284: -------------------------------------- Hi Paris, Would we also want to add the ability to sample by percentage? Also what would the fieldID be referring to? I was thinking that there were 2 naive possible solutions. 1) Once the trigger is made, we randomly sample for N samples or a percentage of all the samples in each window 2) Given a percentage of samples we want to retain from each window generate a random number between 0 and 1. Append to result if the random number is less than the specified percentage. > Uniform random sampling operator over windows > --------------------------------------------- > > Key: FLINK-1284 > URL: https://issues.apache.org/jira/browse/FLINK-1284 > Project: Flink > Issue Type: New Feature > Components: Streaming > Reporter: Paris Carbone > Priority: Minor > > It would be useful for several use cases to have a built-in uniform random > sampling operator in the streaming API that can operate on windows. This can > be used for example for online machine learning operations, evaluating > heuristics or continuous visualisation of representative values. > The operator could be given a field and a number of random samples needed, > following a window statement as such: > mystream.window(..).sample(fieldID,#samples) > Given that pre-aggregation is enabled, this could perhaps be implemented as a > binary reduce operator or a combinable groupreduce that pre-aggregates the > empiricals of that field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)