[
https://issues.apache.org/jira/browse/SPARK-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiangrui Meng resolved SPARK-3160.
----------------------------------
Resolution: Fixed
Fix Version/s: 1.2.0
Issue resolved by pull request 2341
[https://github.com/apache/spark/pull/2341]
> Simplify DecisionTree data structure for training
> -------------------------------------------------
>
> Key: SPARK-3160
> URL: https://issues.apache.org/jira/browse/SPARK-3160
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Reporter: Joseph K. Bradley
> Assignee: Joseph K. Bradley
> Priority: Minor
> Fix For: 1.2.0
>
>
> Improvement: code clarity
> Currently, we maintain a tree structure, a flat array of nodes, and a
> parentImpurities array.
> Proposed fix: Maintain everything within a growing tree structure.
> This would let us eliminate the flat array of nodes, thus saving storage when
> we do not grow a full tree. It would also potentially make it easier to pass
> subtrees to compute nodes for local training.
> Note:
> * This JIRA used to have this item as well: We could have a “LearningNode
> extends Node” setup where the LearningNode holds metadata for learning (such
> as impurities). The test-time model could be extracted from this
> training-time model, so that extra information (such as impurities) does not
> have to be kept after training.
> * However, this is really a separate issue, so I removed it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]