[ https://issues.apache.org/jira/browse/FLINK-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15286371#comment-15286371 ]
ASF GitHub Bot commented on FLINK-2259: --------------------------------------- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1898#discussion_r63498454 --- Diff: docs/apis/batch/libs/ml/index.md --- @@ -86,10 +87,18 @@ Now you can start solving your analysis task. The following code snippet shows how easy it is to train a multiple linear regression model. {% highlight scala %} + + --- End diff -- Why inserting two line breaks here? > Support training Estimators using a (train, validation, test) split of the > available data > ----------------------------------------------------------------------------------------- > > Key: FLINK-2259 > URL: https://issues.apache.org/jira/browse/FLINK-2259 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library > Reporter: Theodore Vasiloudis > Assignee: Trevor Grant > Priority: Minor > Labels: ML > > When there is an abundance of data available, a good way to train models is > to split the available data into 3 parts: Train, Validation and Test. > We use the Train data to train the model, the Validation part is used to > estimate the test error and select hyperparameters, and the Test is used to > evaluate the performance of the model, and assess its generalization [1] > This is a common approach when training Artificial Neural Networks, and a > good strategy to choose in data-rich environments. Therefore we should have > some support of this data-analysis process in our Estimators. > [1] Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of > statistical learning. Vol. 1. Springer, Berlin: Springer series in > statistics, 2001. -- This message was sent by Atlassian JIRA (v6.3.4#6332)