Github user sachingoel0101 commented on the pull request:
https://github.com/apache/flink/pull/757#issuecomment-117220314
Hey @thvasilo , I'm going to break up this PR further. The motivation is
that, the Sampling code should be available as a general feature. Given a
probability distribution over data, user should be able to sample as many
points as they want.
The Sampler will take the DataSet as input, number of samples required and
a function which determines the relative probability of a particular element
being picked, apart from specifying whether the elements should be sampled with
replacement or without replacement.
Let me know your thoughts. I'll work out a version in the meantime. If this
is desirable, I will file a JIRA ticket and open a separate PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---