[
https://issues.apache.org/jira/browse/SPARK-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14337742#comment-14337742
]
Liang-Chi Hsieh commented on SPARK-6004:
----------------------------------------
Stopping training early is making sense for convergence problem. For
determining iteration number, it is more common to just tune it by monitoring
error/performance curve with regard to iteration number.
It would be great if we can stop early and get the best model without wasting
more compute time. But we know that the validation error does not change
monotonically. So if you stop at 20 iterations, how do you know it will not
gain performance again at next iteration? It is too rough to stop training just
because the validation error is not improved compared with previous iteration.
I think that keeping validationTol is good for allowing users to know where the
best model is located during the training iterations. So they don't really need
to draw the error/performance curve regarding validation dataset. My concern is
only about the default behavior of stopping training early.
> Pick the best model when training GradientBoostedTrees with validation
> ----------------------------------------------------------------------
>
> Key: SPARK-6004
> URL: https://issues.apache.org/jira/browse/SPARK-6004
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Reporter: Liang-Chi Hsieh
> Priority: Minor
>
> Since the validation error does not change monotonically, in practice, it
> should be proper to pick the best model when training GradientBoostedTrees
> with validation instead of stopping it early.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]