[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638272#comment-14638272
 ] 

Chengxiang Li commented on FLINK-1901:
--------------------------------------

Thanks,[~till.rohrmann], i got it now. This more like a iteration optimization 
issue to me, it assumes that the output of static code path would always be the 
same, so it cached the output for potential performance improvement, but this 
assumption is not always true, for example, static code path with random 
sampling operator, data source read from HBase, and so on. I think we could 
open a separate JIRA to address it in a uniform way instead of taking random 
sampling as a special case in this JIRA.

> Create sample operator for Dataset
> ----------------------------------
>
>                 Key: FLINK-1901
>                 URL: https://issues.apache.org/jira/browse/FLINK-1901
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Theodore Vasiloudis
>            Assignee: Chengxiang Li
>
> In order to be able to implement Stochastic Gradient Descent and a number of 
> other machine learning algorithms we need to have a way to take a random 
> sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset, 
> choose the relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to