[
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636451#comment-14636451
]
Till Rohrmann commented on FLINK-1901:
--------------------------------------
If you use the sampling operator this way, it works. However, usually your
iteration data set is something like the weight vector of your model and you
have another training dataset from which you want to take a small sample to
update your weight vector in each iteration (e.g. SGD). When you write a
program like that, then you'll see that the output of the sampling operator
will always be the same (for every iteration). The reason is that the sampling
no longer is on the dynamic path of the iteration and thus it is only once
calculated and then cached. This is not the intended behaviour, though.
> Create sample operator for Dataset
> ----------------------------------
>
> Key: FLINK-1901
> URL: https://issues.apache.org/jira/browse/FLINK-1901
> Project: Flink
> Issue Type: Improvement
> Components: Core
> Reporter: Theodore Vasiloudis
> Assignee: Chengxiang Li
>
> In order to be able to implement Stochastic Gradient Descent and a number of
> other machine learning algorithms we need to have a way to take a random
> sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset,
> choose the relative size of the sample, and set a seed for reproducibility.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)