GitHub user njayaram2 opened a pull request:
https://github.com/apache/madlib/pull/295
Recursive Partitioning: Add function to report importance scores
JIRA: MADLIB-925
This commit adds a new MADlib function (get_var_importance) to report the
importance scores in decision tree and random forest. RF models prior to
MADlib 1.15 used to have variable importance scores reported, but they
also have impurity variable importance from 1.15 onwards. This function
reports both those scores for >=1.15 RF models, and only the oob variable
importance score for <1.15 RF models.
This function when called for a DT model, would return the impurity
variable importance score for >=1.15 DT models.
Co-authored-by: Jingyi Mei <[email protected]>
Co-authored-by: Orhan Kislal <[email protected]>
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/madlib/madlib feature/output-importance
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/madlib/pull/295.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #295
----
commit 54a4a17915f6ce1ddea6260db2d06fcd0ee50f51
Author: Nandish Jayaram <njayaram@...>
Date: 2018-07-03T19:22:07Z
Recursive Partitioning: Add function to report importance scores
JIRA: MADLIB-925
This commit adds a new MADlib function (get_var_importance) to report the
importance scores in decision tree and random forest. RF models prior to
MADlib 1.15 used to have variable importance scores reported, but they
also have impurity variable importance from 1.15 onwards. This function
reports both those scores for >=1.15 RF models, and only the oob variable
importance score for <1.15 RF models.
This function when called for a DT model, would return the impurity
variable importance score for >=1.15 DT models.
Co-authored-by: Jingyi Mei <[email protected]>
Co-authored-by: Orhan Kislal <[email protected]>
----
---