[
https://issues.apache.org/jira/browse/SPARK-9190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-9190.
------------------------------
Resolution: Duplicate
[~duc-trung.nguyen] why did you clone this?
> CLONE - Support DecisionTree pruning
> ------------------------------------
>
> Key: SPARK-9190
> URL: https://issues.apache.org/jira/browse/SPARK-9190
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Reporter: Duc Trung NGUYEN
>
> Improvement: accuracy, computation
> Summary: Pruning is a common method for preventing overfitting with decision
> trees. A smart implementation can prune the tree during training in order to
> avoid training parts of the tree which would be pruned eventually anyways.
> DecisionTree does not currently support pruning.
> Pruning: A “pruning” of a tree is a subtree with the same root node, but
> with zero or more branches removed.
> A naive implementation prunes as follows:
> (1) Train a depth K tree using a training set.
> (2) Compute the optimal prediction at each node (including internal nodes)
> based on the training set.
> (3) Take a held-out validation set, and use the tree to make predictions for
> each validation example. This allows one to compute the validation error
> made at each node in the tree (based on the predictions computed in step (2).)
> (4) For each pair of leafs with the same parent, compare the total error on
> the validation set made by the leafs’ predictions with the error made by the
> parent’s predictions. Remove the leafs if the parent has lower error.
> A smarter implementation prunes during training, computing the error on the
> validation set made by each node as it is trained. Whenever two children
> increase the validation error, they are pruned, and no more training is
> required on that branch.
> It is common to use about 1/3 of the data for pruning. Note that pruning is
> important when using a tree directly for prediction. It is less important
> when combining trees via ensemble methods.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]