[ 
https://issues.apache.org/jira/browse/SPARK-11730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023140#comment-15023140
 ] 

Seth Hendrickson commented on SPARK-11730:
------------------------------------------

Another consideration is that the current GBT implementation uses MLlib 
decision trees, and so count information is not stored at each node. This means 
that we cannot currently weight the information gain by number of data points 
at each node for the feature importance calculation. We could either remove 
this weighting in the GBT calculation or wait until the GBT implementation is 
moved to spark.ml. I would prefer the latter, considering that some other Jiras 
depend on the GBT implementation being moved as well. 

> Feature Importance for GBT
> --------------------------
>
>                 Key: SPARK-11730
>                 URL: https://issues.apache.org/jira/browse/SPARK-11730
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, MLlib
>            Reporter: Brian Webb
>
> Random Forests have feature importance, but GBT do not. It would be great if 
> we can add feature importance to GBT as well. Perhaps the code in Random 
> Forests can be refactored to apply to both types of ensembles.
> See https://issues.apache.org/jira/browse/SPARK-5133



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to