[ https://issues.apache.org/jira/browse/SPARK-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391521#comment-14391521 ]
Manoj Kumar commented on SPARK-5972: ------------------------------------ [~josephkb] This should be done independently of evaluateEachIteration right? (In the sense, that evaluateEachIteration should not be used in the GradientBoostedTrees code that does this, that is caching the error and residuals, since the model has not been trained yet) > Cache residuals for GradientBoostedTrees during training > -------------------------------------------------------- > > Key: SPARK-5972 > URL: https://issues.apache.org/jira/browse/SPARK-5972 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 1.3.0 > Reporter: Joseph K. Bradley > Priority: Minor > > In gradient boosting, the current model's prediction is re-computed for each > training instance on every iteration. The current residual (cumulative > prediction of previously trained trees in the ensemble) should be cached. > That could reduce both computation (only computing the prediction of the most > recently trained tree) and communication (only sending the most recently > trained tree to the workers). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org