[
https://issues.apache.org/jira/browse/FLINK-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253383#comment-15253383
]
Austin Ouyang edited comment on FLINK-1284 at 4/22/16 4:40 PM:
---------------------------------------------------------------
Hi [~senorcarbone],
Would we also want to add the ability to sample by percentage? Also what would
the fieldID be referring to? I was thinking that there were 2 naive possible
solutions.
1) Once the trigger is made, we randomly sample for N samples or a percentage
of all the samples in each window
2) Given a percentage of samples we want to retain from each window generate a
random number between 0 and 1. Append to result if the random number is less
than the specified percentage.
I'd be happy to try working on this as well!
was (Author: aouyang1):
Hi Paris,
Would we also want to add the ability to sample by percentage? Also what would
the fieldID be referring to? I was thinking that there were 2 naive possible
solutions.
1) Once the trigger is made, we randomly sample for N samples or a percentage
of all the samples in each window
2) Given a percentage of samples we want to retain from each window generate a
random number between 0 and 1. Append to result if the random number is less
than the specified percentage.
> Uniform random sampling operator over windows
> ---------------------------------------------
>
> Key: FLINK-1284
> URL: https://issues.apache.org/jira/browse/FLINK-1284
> Project: Flink
> Issue Type: New Feature
> Components: Streaming
> Reporter: Paris Carbone
> Priority: Minor
>
> It would be useful for several use cases to have a built-in uniform random
> sampling operator in the streaming API that can operate on windows. This can
> be used for example for online machine learning operations, evaluating
> heuristics or continuous visualisation of representative values.
> The operator could be given a field and a number of random samples needed,
> following a window statement as such:
> mystream.window(..).sample(fieldID,#samples)
> Given that pre-aggregation is enabled, this could perhaps be implemented as a
> binary reduce operator or a combinable groupreduce that pre-aggregates the
> empiricals of that field.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)