[
https://issues.apache.org/jira/browse/MADLIB-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074937#comment-16074937
]
Frank McQuillan commented on MADLIB-1087:
-----------------------------------------
Yes this is an issue in 1.9.1 also.
It is fixed in for the next release 1.12.
But the PR has been merged so you could build from master if you need it now.
Frank
> Random Forest fails if features are INT or NUMERIC only and variable
> importance is TRUE
> ---------------------------------------------------------------------------------------
>
> Key: MADLIB-1087
> URL: https://issues.apache.org/jira/browse/MADLIB-1087
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Random Forest
> Reporter: Paul Chang
> Assignee: Rahul Iyer
> Priority: Minor
> Fix For: v1.12
>
>
> If we attempt to train on a dataset where all features are either INT or
> NUMERIC, and with variable importance TRUE, forest_train() fails with the
> following error:
> [2017-04-03 13:35:35] [XX000] ERROR: plpy.SPIError: invalid array length
> (plpython.c:4648)
> [2017-04-03 13:35:35] Detail: array_of_bigint: Size should be in [1, 1e7], 0
> given
> [2017-04-03 13:35:35] Where: Traceback (most recent call last):
> [2017-04-03 13:35:35] PL/Python function "forest_train", line 42, in <module>
> [2017-04-03 13:35:35] sample_ratio
> [2017-04-03 13:35:35] PL/Python function "forest_train", line 591, in
> forest_train
> [2017-04-03 13:35:35] PL/Python function "forest_train", line 1038, in
> _calculate_oob_prediction
> [2017-04-03 13:35:35] PL/Python function "forest_train"
> However, if we add a single feature column that is FLOAT, REAL, or DOUBLE
> PRECISION, the trainer does not fail.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)