[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635082#comment-14635082
 ] 

Sachin Goel commented on FLINK-1901:
------------------------------------

Okay. So I checked the whole code and well, the random sampling I'm using there 
is never used inside an iteration. So as far as that goes, there are no 
problems. However, it would certainly be good to have a separate random 
sampling module, which can work on any data set, for that matter.
[[email protected]], do you think there is any utility for a sampling 
procedure different from random? That is, suppose there is a function which 
maps every element in the dataset to its probability of selection.
[~chengxiang li], yes. There is an ongoing PR 
(https://github.com/apache/flink/pull/757). And yes. It would certainly make 
sense to have a generic sample function.

> Create sample operator for Dataset
> ----------------------------------
>
>                 Key: FLINK-1901
>                 URL: https://issues.apache.org/jira/browse/FLINK-1901
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Theodore Vasiloudis
>            Assignee: Chengxiang Li
>
> In order to be able to implement Stochastic Gradient Descent and a number of 
> other machine learning algorithms we need to have a way to take a random 
> sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset, 
> choose the relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to