[ 
https://issues.apache.org/jira/browse/MADLIB-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433092#comment-16433092
 ] 

ASF GitHub Bot commented on MADLIB-1225:
----------------------------------------

Github user asfgit closed the pull request at:

    https://github.com/apache/madlib/pull/258


> Sporadic install check failures in random forest
> ------------------------------------------------
>
>                 Key: MADLIB-1225
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1225
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Random Forest
>            Reporter: Nandish Jayaram
>            Priority: Major
>             Fix For: v1.14
>
>
> Install check seems to fail for random forest sporadically. The failure 
> happens for the test which deals with variable importance in the install 
> check.
> The error in the log when a failure happens is:
> {code}
> SELECT
>  assert(cat_var_importance[1] > con_var_importance[1], 'class should be 
> important!'),
>  assert(cat_var_importance[1] > cat_var_importance[2], 'class should be 
> important!')
> FROM train_output_group;
> psql:/tmp/madlib.WW_EyD/recursive_partitioning/test/random_forest.sql_in.tmp:158:
>  ERROR: Failed assertion: class should be important! (seg0 slice1 
> 93e250c8-8924-4a80-5c68-1464f40b0395:25432 pid=91044)
> {code}
> The last RF install-check query that was run before the error was:
> {code}
> SELECT forest_train(
>  'dt_golf', -- source table
>  'train_output', -- output model table
>  'id', -- id column
>  'class::TEXT', -- response
>  'class, windy, temperature', -- features
>  NULL, -- exclude columns
>  NULL, -- no grouping
>  10, -- num of trees
>  1, -- num of random features
>  TRUE, -- importance
>  3, -- num_permutations
>  10, -- max depth
>  1, -- min split
>  1, -- min bucket
>  8, -- number of bins per continuous variable
>  'max_surrogates=0',
>  FALSE
>  );
> SELECT * from train_output_summary;
> -[ RECORD 1 ]---------+--------------------------------
> method | forest_train
> is_classification | t
> source_table | dt_golf
> model_table | train_output
> id_col_name | id
> dependent_varname | class::TEXT
> independent_varnames | class,windy,temperature
> cat_features | class,windy
> con_features | temperature
> grouping_cols |
> num_trees | 10
> num_random_features | 1
> max_tree_depth | 10
> min_split | 1
> min_bucket | 1
> num_splits | 8
> verbose | f
> importance | t
> num_permutations | 3
> num_all_groups | 1
> num_failed_groups | 0
> total_rows_processed | 16
> total_rows_skipped | 0
> dependent_var_levels | "Don't Play","Play"
> dependent_var_type | text
> independent_var_types | text, boolean, double precision
> null_proxy | None
> SELECT * from train_output_group;
> -[ RECORD 1 ]------+---------------------------------------
> gid | 1
> success | t
> cat_n_levels | \{2,2}
> cat_levels_in_text | \{"Don't Play",Play,False,True}
> oob_error | 0.20000000000000000000
> cat_var_importance | \{0.0244444444444445,0.025487012987013}
> con_var_importance | \{0}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to