[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634826#comment-14634826
 ] 

Till Rohrmann commented on FLINK-1901:
--------------------------------------

To be honest, I doubt that the sampling is executed repeatedly if it's not the 
iteration data set from which you're sampling. If you use map and reduce 
operations which lie on the static path, then the results will be executed once 
and cached. But best you check the samples.

If it is possible to create a separate PR out of it, then it would be great. 
Makes reviewing much easier.

> Create sample operator for Dataset
> ----------------------------------
>
>                 Key: FLINK-1901
>                 URL: https://issues.apache.org/jira/browse/FLINK-1901
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Theodore Vasiloudis
>            Assignee: Chengxiang Li
>
> In order to be able to implement Stochastic Gradient Descent and a number of 
> other machine learning algorithms we need to have a way to take a random 
> sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset, 
> choose the relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to