[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645658#comment-14645658
 ] 

ASF GitHub Bot commented on FLINK-1901:
---------------------------------------

GitHub user ChengXiangLi opened a pull request:

    https://github.com/apache/flink/pull/949

    [FLINK-1901] [core] Create sample operator for Dataset.

    This PR includes:
    1. 4 random sampler implementation for different sample strategies.
    2. sample operator for DataSet Java API.
    3. random sampler unit test.
    4. sample operator Java API integration test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ChengXiangLi/flink FLINK-1901

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/949.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #949
    
----
commit f7ba8779b8d6a6d66ab5d4e2435a70e220b1e0fc
Author: chengxiang li <chengxiang...@intel.com>
Date:   2015-07-22T03:38:13Z

    [FLINK-1901] [core] Create sample operator for Dataset.

----


> Create sample operator for Dataset
> ----------------------------------
>
>                 Key: FLINK-1901
>                 URL: https://issues.apache.org/jira/browse/FLINK-1901
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Theodore Vasiloudis
>            Assignee: Chengxiang Li
>
> In order to be able to implement Stochastic Gradient Descent and a number of 
> other machine learning algorithms we need to have a way to take a random 
> sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset, 
> choose the relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to