RF: Fix user doc examples

riyer Wed, 01 Aug 2018 13:00:55 -0700

DT/RF: Fix user doc examples


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/186390f7
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/186390f7
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/186390f7

Branch: refs/heads/master
Commit: 186390f7c2af5ad886a4d5b77d0792b68cd3414d
Parents: 1aac377
Author: Frank McQuillan <fmcquil...@pivotal.io>
Authored: Wed Aug 1 12:49:10 2018 -0700
Committer: Rahul Iyer <ri...@apache.org>
Committed: Wed Aug 1 12:58:44 2018 -0700

----------------------------------------------------------------------
 .../recursive_partitioning/decision_tree.sql_in     | 16 ++++++++++------
 .../recursive_partitioning/random_forest.sql_in     | 12 +++++++-----
 2 files changed, 17 insertions(+), 11 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/madlib/blob/186390f7/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in
----------------------------------------------------------------------
diff --git 
a/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in 
b/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in
index 469f1b2..5926152 100644
--- a/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in
+++ b/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in
@@ -284,14 +284,17 @@ tree_train(
       <th>impurity_var_importance</th>
       <td>DOUBLE PRECISION[]. Impurity importance of each variable.
       The order of the variables is the same as
-      that of 'independent_varnames' column in the summary table (see below).
+      that of the 'independent_varnames' column in the summary table (see 
below).
 
       The impurity importance of any feature is the decrease in impurity by a
       node containing the feature as a primary split, summed over the whole
       tree. If surrogates are used, then the importance value includes the
       impurity decrease scaled by the adjusted surrogate agreement.
-      Reported importance values are normalized to sum to 100 across
-      all variables.
+      Importance values are displayed as raw values as per the 
'split_criterion'
+      parameter.
+      To see importance values normalized to sum to 100 across
+      all variables, use the importance display helper function 
+      described later on this page. 
       Please refer to [1] for more information on variable importance.
       </td>
       </tr>
@@ -727,7 +730,7 @@ independent_var_types       | text, boolean, double 
precision
 n_folds                     | 0
 null_proxy                  |
 </pre>
-View the impurity importance table using the helper function:
+View the normalized impurity importance table using the helper function:
 <pre class="example">
 \\x off
 DROP TABLE IF EXISTS imp_output;
@@ -1111,10 +1114,11 @@ which shows ordering of levels of categorical variables 
'vs' and 'cyl':
 SELECT pruning_cp, cat_levels_in_text, cat_n_levels, impurity_var_importance, 
tree_depth FROM train_output;
 </pre>
 <pre class="result">
+-[ RECORD 1 
]-----------+------------------------------------------------------------------------
 pruning_cp              | 0
 cat_levels_in_text      | {0,1,4,6,8}
 cat_n_levels            | {2,3}
-impurity_var_importance | 
{0,51.8593201959496,10.976977929129,5.31897402755374,31.8447278473677}
+impurity_var_importance | 
{0,22.6309172500675,4.79024943310651,2.32115000000003,13.8967382920111}
 tree_depth              | 4
 </pre>
 View the summary table:
@@ -1147,7 +1151,7 @@ independent_var_types       | integer, integer, double 
precision, double precisi
 n_folds                     | 0
 null_proxy                  |
 </pre>
-View the impurity importance table using the helper function:
+View the normalized impurity importance table using the helper function:
 <pre class="example">
 \\x off
 DROP TABLE IF EXISTS imp_output;

http://git-wip-us.apache.org/repos/asf/madlib/blob/186390f7/src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in
----------------------------------------------------------------------
diff --git 
a/src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in 
b/src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in
index 39b6f5d..5b5a0f0 100644
--- a/src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in
+++ b/src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in
@@ -164,7 +164,9 @@ forest_train(training_table_name,
     Due to nature of permutation, the importance value can end up being
     negative if the number of levels for a categorical variable is small and is
     unbalanced. In such a scenario, the importance values are shifted to ensure
-    that the lowest importance value is 0.
+    that the lowest importance value is 0.  To see importance values 
normalized 
+    to sum to 100 across all variables, use the importance display helper 
function 
+    described later on this page. 
 
   </DD>
 
@@ -758,7 +760,7 @@ the variables in 'independent_varnames'
 in <model_table>_summary.
 A higher value means higher importance for the
 variable.  We can use the helper function to
-get a better view of variable importance:
+get a normalized view of variable importance:
 <pre class="example">
 \\x off
 DROP TABLE IF EXISTS imp_output;
@@ -1160,7 +1162,7 @@ oob_error               | 16.5197718747446
 oob_var_importance      | 
{5.22711111111111,10.0872041666667,9.6875362244898,3.97782,2.99447839506173}
 impurity_var_importance | 
{5.1269704861111,7.04765974920884,20.9817274159476,4.02800949238769,10.5539079705215}
 </pre>
-Use the helper function to display variable importance:
+Use the helper function to display normalized variable importance:
 <pre class="example">
 \\x off
 DROP TABLE IF EXISTS mt_imp_output;
@@ -1347,14 +1349,14 @@ View the summary table:
 SELECT * FROM train_output_group;
 </pre>
 <pre class='result'>
--[ RECORD 1 ]-----------+-----------------------------------------------------
+-[ RECORD 1 ]-----------+-----------------------------------------
 gid                     | 1
 success                 | t
 cat_n_levels            | {2,2,2}
 cat_levels_in_text      | {US,__NULL__,rainy,__NULL__,NY,__NULL__}
 oob_error               | 1.00000000000000000000
 oob_var_importance      | {0,0,0}
-impurity_var_importance | {32.1752184623349,25.2686155402256,22.5560374792348}
+impurity_var_importance | {0.125,0.0944444444444,0.1836666666667}
 </pre>
 
 -# Predict for data not previously seen by assuming NULL

[2/2] madlib git commit: DT/RF: Fix user doc examples

Reply via email to