[
https://issues.apache.org/jira/browse/FLINK-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253954#comment-15253954
]
ASF GitHub Bot commented on FLINK-2259:
---------------------------------------
Github user tillrohrmann commented on the pull request:
https://github.com/apache/flink/pull/1898#issuecomment-213434810
Thanks for your contribution @rawkintrevo. Good work. I had some minor
inline comments. I'm mainly concerned about the efficiency of
`multiRandomSplit` because it can construct some really long pipelines.
I think we should also add online documentation for the `Splitter`.
Otherwise people will just miss it. You can take a look at `docs/libs/ml/` and
create a web page for the splitter. We could then create a site with tools from
where we link to the `Splitter`, for example.
> Support training Estimators using a (train, validation, test) split of the
> available data
> -----------------------------------------------------------------------------------------
>
> Key: FLINK-2259
> URL: https://issues.apache.org/jira/browse/FLINK-2259
> Project: Flink
> Issue Type: New Feature
> Components: Machine Learning Library
> Reporter: Theodore Vasiloudis
> Assignee: Trevor Grant
> Priority: Minor
> Labels: ML
>
> When there is an abundance of data available, a good way to train models is
> to split the available data into 3 parts: Train, Validation and Test.
> We use the Train data to train the model, the Validation part is used to
> estimate the test error and select hyperparameters, and the Test is used to
> evaluate the performance of the model, and assess its generalization [1]
> This is a common approach when training Artificial Neural Networks, and a
> good strategy to choose in data-rich environments. Therefore we should have
> some support of this data-analysis process in our Estimators.
> [1] Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of
> statistical learning. Vol. 1. Springer, Berlin: Springer series in
> statistics, 2001.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)