[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636451#comment-14636451
 ] 

Till Rohrmann commented on FLINK-1901:
--------------------------------------

If you use the sampling operator this way, it works. However, usually your 
iteration data set is something like the weight vector of your model and you 
have another training dataset from which you want to take a small sample to 
update your weight vector in each iteration (e.g. SGD). When you write a 
program like that, then you'll see that the output of the sampling operator 
will always be the same (for every iteration). The reason is that the sampling 
no longer is on the dynamic path of the iteration and thus it is only once 
calculated and then cached. This is not the intended behaviour, though.

> Create sample operator for Dataset
> ----------------------------------
>
>                 Key: FLINK-1901
>                 URL: https://issues.apache.org/jira/browse/FLINK-1901
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Theodore Vasiloudis
>            Assignee: Chengxiang Li
>
> In order to be able to implement Stochastic Gradient Descent and a number of 
> other machine learning algorithms we need to have a way to take a random 
> sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset, 
> choose the relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to