Joseph K. Bradley created SPARK-16719:
-----------------------------------------
Summary: RandomForest: communicate fewer trees on each iteration
Key: SPARK-16719
URL: https://issues.apache.org/jira/browse/SPARK-16719
Project: Spark
Issue Type: Improvement
Components: ML
Reporter: Joseph K. Bradley
Assignee: Joseph K. Bradley
RandomForest currently sends the entire forest to each worker on each
iteration. This is because (a) the node queue is FIFO and (b) the closure
references the entire array of trees ({{topNodes}}). (a) causes RFs to handle
splits in many trees, especially early on in learning. (b) sends all trees
explicitly.
Proposal:
(a) Change the RF node queue to be FILO, so that RFs tend to focus on 1 or a
few trees before focusing on others.
(b) Change topNodes to pass only the trees required on that iteration.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]