[
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638272#comment-14638272
]
Chengxiang Li commented on FLINK-1901:
--------------------------------------
Thanks,[~till.rohrmann], i got it now. This more like a iteration optimization
issue to me, it assumes that the output of static code path would always be the
same, so it cached the output for potential performance improvement, but this
assumption is not always true, for example, static code path with random
sampling operator, data source read from HBase, and so on. I think we could
open a separate JIRA to address it in a uniform way instead of taking random
sampling as a special case in this JIRA.
> Create sample operator for Dataset
> ----------------------------------
>
> Key: FLINK-1901
> URL: https://issues.apache.org/jira/browse/FLINK-1901
> Project: Flink
> Issue Type: Improvement
> Components: Core
> Reporter: Theodore Vasiloudis
> Assignee: Chengxiang Li
>
> In order to be able to implement Stochastic Gradient Descent and a number of
> other machine learning algorithms we need to have a way to take a random
> sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset,
> choose the relative size of the sample, and set a seed for reproducibility.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)