Joseph K. Bradley created SPARK-5870:
----------------------------------------
Summary: GradientBoostedTrees should cache residuals from partial
model
Key: SPARK-5870
URL: https://issues.apache.org/jira/browse/SPARK-5870
Project: Spark
Issue Type: Improvement
Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
On each iteration, GradientBoostedTrees computes predictions for each training
instance using the partial model. This means it re-computes the prediction of
each tree on every following iteration, making for O(numIterations^2) work
instead of O(numIterations).
It should instead cache the current residuals and update them with the
predictions from the newest tree on each iteration.
This will likely speed things up when using small trees (where training trees
is fastest). For large trees, training may be costly enough to amortize the
cost of re-computing predictions on each iteration.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]