[ 
https://issues.apache.org/jira/browse/MADLIB-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005862#comment-16005862
 ] 

ASF GitHub Bot commented on MADLIB-1097:
----------------------------------------

Github user iyerr3 closed the pull request at:

    https://github.com/apache/incubator-madlib/pull/131


> Random Forest does not allow NULL values in features
> ----------------------------------------------------
>
>                 Key: MADLIB-1097
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1097
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Module: Random Forest
>            Reporter: Nandish Jayaram
>            Assignee: Rahul Iyer
>            Priority: Minor
>             Fix For: v1.12
>
>
> Running forest_train() with features that have NULL values results in the 
> following error:
> {code}
> psql:/tmp/madlib.LkFR_5/recursive_partitioning/test/random_forest.sql_in.tmp:79:
>  ERROR:  spiexceptions.InvalidParameterValue: Function 
> "_rf_cat_imp_score(bytea8,integer[],double 
> precision[],integer[],integer,double precision,boolean,double precision[])": 
> Invalid type conversion. Null where not expected.
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "forest_train", line 42, in <module>
>     sample_ratio
>   PL/Python function "forest_train", line 605, in forest_train
>   PL/Python function "forest_train", line 1052, in _calculate_oob_prediction
> PL/Python function "forest_train"
> {code}
> The following are the input table and parameters used:
> {code:sql}
> CREATE TABLE dt_golf (
>     id integer NOT NULL,
>     "OUTLOOK" text,
>     temperature double precision,
>     humidity double precision,
>     windy boolean,
>     class text
> ) ;
> INSERT INTO dt_golf (id,"OUTLOOK",temperature,humidity,windy,class) VALUES
> (1, 'sunny', 85, 85, false, 'Don''t Play'),
> (2, 'sunny', 80, 90, true, 'Don''t Play'),
> (3, 'overcast', 83, 78, false, 'Play'),
> (4, 'rain', NULL, 96, false, 'Play'),
> (5, 'rain', 68, 80, NULL, 'Play'),
> (6, 'rain', 65, 70, true, 'Don''t Play'),
> (7, 'overcast', 64, 65, true, 'Play'),
> (8, 'sunny', 72, 95, false, 'Don''t Play'),
> (9, 'sunny', 69, 70, false, 'Play'),
> (10, 'rain', 75, 80, false, 'Play'),
> (11, 'sunny', 75, 70, true, 'Play'),
> (12, 'overcast', 72, 90, true, 'Play'),
> (13, 'overcast', 81, 75, false, 'Play'),
> (14, 'rain', 71, 80, true, 'Don''t Play');
> SELECT forest_train(
>                   'dt_golf'::TEXT,         -- source table
>                   'train_output'::TEXT,    -- output model table
>                   'id'::TEXT,              -- id column
>                   'class'::TEXT,           -- response
>                   'windy, temperature'::TEXT,   -- features
>                   NULL::TEXT,        -- exclude columns
>                   NULL::TEXT,        -- no grouping
>                   5,                -- num of trees
>                   1,                 -- num of random features
>                   TRUE::BOOLEAN,    -- importance
>                   1::INTEGER,       -- num_permutations
>                   10::INTEGER,       -- max depth
>                   1::INTEGER,        -- min split
>                   1::INTEGER,        -- min bucket
>                   8::INTEGER,        -- number of bins per continuous variable
>                   'max_surrogates=0',
>                   FALSE
>                   );
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to