[
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661258#comment-14661258
]
ASF GitHub Bot commented on FLINK-1901:
---------------------------------------
Github user ChengXiangLi commented on the pull request:
https://github.com/apache/flink/pull/949#issuecomment-128580607
Hi, @tillrohrmann , current implementation of sample with fixed size would
generate fixed size sample for each partition randomly instead of the whole
dataset, user may expect the later one actually most of the time. I'm research
on how to sample fixed size elements randomly from distributed data stream, i
think we can pause this PR review until i merge the previous fix.
> Create sample operator for Dataset
> ----------------------------------
>
> Key: FLINK-1901
> URL: https://issues.apache.org/jira/browse/FLINK-1901
> Project: Flink
> Issue Type: Improvement
> Components: Core
> Reporter: Theodore Vasiloudis
> Assignee: Chengxiang Li
>
> In order to be able to implement Stochastic Gradient Descent and a number of
> other machine learning algorithms we need to have a way to take a random
> sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset,
> choose the relative size of the sample, and set a seed for reproducibility.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)