[
https://issues.apache.org/jira/browse/SPARK-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358983#comment-15358983
]
Seth Hendrickson commented on SPARK-4240:
-----------------------------------------
I had done some work on this in the past, but haven't looked at it for a while
now. I may have some time to pick it back up again in a few weeks, but if you
are interested in working on it then feel free (please do indicate as such
here, though). Thanks!
> Refine Tree Predictions in Gradient Boosting to Improve Prediction Accuracy.
> ----------------------------------------------------------------------------
>
> Key: SPARK-4240
> URL: https://issues.apache.org/jira/browse/SPARK-4240
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Affects Versions: 1.3.0
> Reporter: Sung Chung
>
> The gradient boosting as currently implemented estimates the loss-gradient in
> each iteration using regression trees. At every iteration, the regression
> trees are trained/split to minimize predicted gradient variance.
> Additionally, the terminal node predictions are computed to minimize the
> prediction variance.
> However, such predictions won't be optimal for loss functions other than the
> mean-squared error. The TreeBoosting refinement can help mitigate this issue
> by modifying terminal node prediction values so that those predictions would
> directly minimize the actual loss function. Although this still doesn't
> change the fact that the tree splits were done through variance reduction, it
> should still lead to improvement in gradient estimations, and thus better
> performance.
> The details of this can be found in the R vignette. This paper also shows how
> to refine the terminal node predictions.
> http://www.saedsayad.com/docs/gbm2.pdf
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]