[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635131#comment-14635131
 ] 

Sachin Goel commented on FLINK-1901:
------------------------------------

Hi [~trohrm...@apache.org], I checked the code. And as you said, it doesn't 
function. The results are cached. 
However, I tested what you said about results being cached during each 
iteration. What if we just broadcasted the current solution to the operator 
that is being used in sampling? That would trigger the execution at each 
iteration. Here's an example that does test this. 
https://gist.github.com/sachingoel0101/0908f28e324c1fae687c
Is this too "hacky"? We could provide two sampling functions, one which works 
outside iterations, and one which does inside iterations.

> Create sample operator for Dataset
> ----------------------------------
>
>                 Key: FLINK-1901
>                 URL: https://issues.apache.org/jira/browse/FLINK-1901
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Theodore Vasiloudis
>            Assignee: Chengxiang Li
>
> In order to be able to implement Stochastic Gradient Descent and a number of 
> other machine learning algorithms we need to have a way to take a random 
> sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset, 
> choose the relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to