[ 
https://issues.apache.org/jira/browse/FLINK-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144606#comment-15144606
 ] 

Trevor Grant commented on FLINK-2259:
-------------------------------------

In the spirit of sci-kit, and just data stuff in general:

The plan of attack should be to make a 2-way random split (with split options), 
then make 2 types of wrappers:

One for a train-test-validate style,

Another for k-folding, which would require some more functionality but would 
more 'grown-up' style. 

So basically I plan on making a function, and then another function just to 
wrap the first- which may seem silly, but the anticipation of k-folds is my 
motivation. 

> Support training Estimators using a (train, validation, test) split of the 
> available data
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-2259
>                 URL: https://issues.apache.org/jira/browse/FLINK-2259
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Theodore Vasiloudis
>            Assignee: Trevor Grant
>            Priority: Minor
>              Labels: ML
>
> When there is an abundance of data available, a good way to train models is 
> to split the available data into 3 parts: Train, Validation and Test.
> We use the Train data to train the model, the Validation part is used to 
> estimate the test error and select hyperparameters, and the Test is used to 
> evaluate the performance of the model, and assess its generalization [1]
> This is a common approach when training Artificial Neural Networks, and a 
> good strategy to choose in data-rich environments. Therefore we should have 
> some support of this data-analysis process in our Estimators.
> [1] Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of 
> statistical learning. Vol. 1. Springer, Berlin: Springer series in 
> statistics, 2001.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to