[ 
https://issues.apache.org/jira/browse/SPARK-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-3160.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 1.2.0

Issue resolved by pull request 2341
[https://github.com/apache/spark/pull/2341]

> Simplify DecisionTree data structure for training
> -------------------------------------------------
>
>                 Key: SPARK-3160
>                 URL: https://issues.apache.org/jira/browse/SPARK-3160
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Joseph K. Bradley
>            Assignee: Joseph K. Bradley
>            Priority: Minor
>             Fix For: 1.2.0
>
>
> Improvement: code clarity
> Currently, we maintain a tree structure, a flat array of nodes, and a 
> parentImpurities array.
> Proposed fix: Maintain everything within a growing tree structure.
> This would let us eliminate the flat array of nodes, thus saving storage when 
> we do not grow a full tree.  It would also potentially make it easier to pass 
> subtrees to compute nodes for local training.
> Note:
> * This JIRA used to have this item as well: We could have a “LearningNode 
> extends Node” setup where the LearningNode holds metadata for learning (such 
> as impurities).  The test-time model could be extracted from this 
> training-time model, so that extra information (such as impurities) does not 
> have to be kept after training.
> * However, this is really a separate issue, so I removed it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to