[
https://issues.apache.org/jira/browse/SPARK-11730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023140#comment-15023140
]
Seth Hendrickson commented on SPARK-11730:
------------------------------------------
Another consideration is that the current GBT implementation uses MLlib
decision trees, and so count information is not stored at each node. This means
that we cannot currently weight the information gain by number of data points
at each node for the feature importance calculation. We could either remove
this weighting in the GBT calculation or wait until the GBT implementation is
moved to spark.ml. I would prefer the latter, considering that some other Jiras
depend on the GBT implementation being moved as well.
> Feature Importance for GBT
> --------------------------
>
> Key: SPARK-11730
> URL: https://issues.apache.org/jira/browse/SPARK-11730
> Project: Spark
> Issue Type: New Feature
> Components: ML, MLlib
> Reporter: Brian Webb
>
> Random Forests have feature importance, but GBT do not. It would be great if
> we can add feature importance to GBT as well. Perhaps the code in Random
> Forests can be refactored to apply to both types of ensembles.
> See https://issues.apache.org/jira/browse/SPARK-5133
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]