[jira] [Commented] (SPARK-11730) Feature Importance for GBT

Seth Hendrickson (JIRA) Mon, 23 Nov 2015 08:53:27 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-11730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022443#comment-15022443
 ]


Seth Hendrickson commented on SPARK-11730:
------------------------------------------

[~josephkb] Please see the paper below by Friedman, equations 44 and 45. He 
proposes that variable importance should be the average across all trees for a 
collection of trees. 

https://statweb.stanford.edu/~jhf/ftp/trebst.pdf

Intuitively it would make sense to me to incorporate tree weights in the 
feature importance, but I have found no instances either in theory or in 
practice of this adjustment. Since R and scikit provide feature importance 
according to the method above, I think it makes sense to stick to that 
convention. Your thoughts are appreciated.

> Feature Importance for GBT
> --------------------------
>
>                 Key: SPARK-11730
>                 URL: https://issues.apache.org/jira/browse/SPARK-11730
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, MLlib
>            Reporter: Brian Webb
>
> Random Forests have feature importance, but GBT do not. It would be great if 
> we can add feature importance to GBT as well. Perhaps the code in Random 
> Forests can be refactored to apply to both types of ensembles.
> See https://issues.apache.org/jira/browse/SPARK-5133



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-11730) Feature Importance for GBT

Reply via email to