GitHub user iyerr3 opened a pull request:
https://github.com/apache/madlib/pull/231
RF: Output non-negative importance values
Variable importance is computed in RF as the difference in prediction
accuracy between original data and permuted data from out-of-bag
samples (OOB). Permuted data is defined as each variable resampled from
its own distribution. This value can end up being negative if the number
of levels for a variable is small and is unbalanced, as the
redistribution doesn't change the data much. This commit shifts all the
importance values if some of them are negative to ensure that the lowest
importance value is 0.
Closes #231
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/iyerr3/incubator-madlib bugfix/rf_neg_var_imp
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/madlib/pull/231.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #231
----
commit f4265854dd94899145c9b40d4ce77450f34bdd78
Author: Rahul Iyer <riyer@...>
Date: 2018-02-06T16:20:49Z
RF: Output non-negative importance values
Variable importance is computed in RF as the difference in prediction
accuracy between original data and permuted data from out-of-bag
samples (OOB). Permuted data is defined as each variable resampled from
its own distribution. This value can end up being negative if the number
of levels for a variable is small and is unbalanced, as the
redistribution doesn't change the data much. This commit shifts all the
importance values if some of them are negative to ensure that the lowest
importance value is 0.
Closes #231
----
---