[jira] [Commented] (SPARK-11730) Feature Importance for GBT

Seth Hendrickson (JIRA) Thu, 19 Nov 2015 10:26:47 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-11730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014087#comment-15014087
 ]


Seth Hendrickson commented on SPARK-11730:
------------------------------------------

I can work on this.

I was taking a look at the feature importance for random forest and it seems 
the feature importance for single decision trees was implemented but not added 
to the decision tree APIs. Since GBT feature importance will likely be some 
aggregation of the individual tree importances, I think we'll need to add it 
for decision trees first. I can create a Jira to add {{featureImportances}}  to 
decision trees.

Regarding how it should be computed, I can verify that scikit-learn computes it 
as the average of feature importances across all of the trees in the ensemble. 
Taking a look at the R vignette, I think that is how they do it as well. The 
current implementation in spark.ml for random forests averages the importances 
across all trees as well, but notes specifically not to do this for GBT. 
[~josephkb] could you clarify this note and add if you have something in mind 
that works for GBT? I haven't found a standard way of computing it for GBT 
other than what is in scikit.


> Feature Importance for GBT
> --------------------------
>
>                 Key: SPARK-11730
>                 URL: https://issues.apache.org/jira/browse/SPARK-11730
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, MLlib
>            Reporter: Brian Webb
>
> Random Forests have feature importance, but GBT do not. It would be great if 
> we can add feature importance to GBT as well. Perhaps the code in Random 
> Forests can be refactored to apply to both types of ensembles.
> See https://issues.apache.org/jira/browse/SPARK-5133



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-11730) Feature Importance for GBT

Reply via email to