GitHub user rawkintrevo opened a pull request:

    https://github.com/apache/flink/pull/1898

    [FLINK-2259][ml] Add Train-Testing Splitters

    This PR adds an object in ml/pipeline called splitter with the following 
methods:
    
    randomSplit: Splits a DataSet into two data sets using DataSet.sample
    multiRandomSplit: Splits a DataSet into multiple datasets according to an 
array of proportions
    kFoldSplit: Splits DataSet into k TrainTest objects which have a testing 
data set of size 1/k of the original dataset and the remainder of the dataset 
in the training
    trainTestSplit: A wrapper for randomSplit that return a TrainTest object
    trainTestHoldoutSplit: A wrapper for multiRandomSplit that returns a 
TrainTestHoldout object
    
    the TrainTest and TrainTestHoldout objects are case classes.  randomSplit 
and multiRandomSplit return arrays of DataSets.
    
    - [x] General
      
    - [ ] Documentation
      - Documentation is in code, will write markdown after 
review/feedback/finalization
    
    - [x] Tests & Build


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rawkintrevo/flink train-test-split

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1898.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1898
    
----
commit ec1e65a31d80b33589b73619f2a5dd0a8e09c568
Author: Trevor Grant <[email protected]>
Date:   2016-04-15T22:37:51Z

    Add Splitter Pre-processing

commit 3ecdc3818dd11a847136510dabe96f444924d319
Author: Trevor Grant <[email protected]>
Date:   2016-04-15T22:40:38Z

    Add Splitter Pre-processing

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to