[ 
https://issues.apache.org/jira/browse/SPARK-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992764#comment-14992764
 ] 

Seth Hendrickson commented on SPARK-4240:
-----------------------------------------

I think we should create a separate JIRA which blocks this one for moving the 
GBT implementation to ml. Once that's done, we can implement the tree boost 
modification to GBTs.

I can create the JIRA and begin work on it if we decide that it's appropriate. 
Note that this would be very similar to [PR 
7294|https://github.com/apache/spark/pull/7294/]. I'd like to continue working 
on this JIRA once the implementation has been moved since I spent some time on 
it already :)

ping [~josephkb] [~dbtsai] [~jbabcock]

> Refine Tree Predictions in Gradient Boosting to Improve Prediction Accuracy.
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-4240
>                 URL: https://issues.apache.org/jira/browse/SPARK-4240
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>    Affects Versions: 1.3.0
>            Reporter: Sung Chung
>
> The gradient boosting as currently implemented estimates the loss-gradient in 
> each iteration using regression trees. At every iteration, the regression 
> trees are trained/split to minimize predicted gradient variance. 
> Additionally, the terminal node predictions are computed to minimize the 
> prediction variance.
> However, such predictions won't be optimal for loss functions other than the 
> mean-squared error. The TreeBoosting refinement can help mitigate this issue 
> by modifying terminal node prediction values so that those predictions would 
> directly minimize the actual loss function. Although this still doesn't 
> change the fact that the tree splits were done through variance reduction, it 
> should still lead to improvement in gradient estimations, and thus better 
> performance.
> The details of this can be found in the R vignette. This paper also shows how 
> to refine the terminal node predictions.
> http://www.saedsayad.com/docs/gbm2.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to