Github user jingyimei commented on a diff in the pull request:
https://github.com/apache/madlib/pull/301#discussion_r205910952
--- Diff:
src/ports/postgres/modules/recursive_partitioning/test/decision_tree.sql_in ---
@@ -282,13 +283,16 @@ SELECT tree_train('dt_golf'::text, -- source
table
6::integer, -- min split
2::integer, -- min bucket
3::integer, -- number of bins per
continuous variable
- 'cp=0.01, n_folds=2' -- cost-complexity
pruning parameter
+ 'cp=0.01, n_folds=2', -- cost-complexity
pruning parameter
+ 'null_as_category=True'
);
SELECT _print_decision_tree(tree) from train_output;
SELECT tree_display('train_output', False);
SELECT impurity_var_importance FROM train_output;
+SELECT cat_levels_in_text, cat_n_levels, impurity_var_importance FROM
train_output;
--- End diff --
Is it better to assert cat_n_levels == {1} instead of just quering?
---