[
https://issues.apache.org/jira/browse/MADLIB-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954138#comment-15954138
]
Paul Chang commented on MADLIB-1087:
------------------------------------
I am running the following GreenPlum and MADlib versions:
PostgreSQL 8.2.15 (Greenplum Database 4.3.10.0 build commit:
f413ff3b006655f14b6b9aa217495ec94da5c96c) on x86_64-unknown-linux-gnu, compiled
by GCC gcc (GCC) 4.4.2 compiled on Oct 21 2016 19:36:26
MADlib version: 1.9, git revision: rc/v1.9-rc1, cmake configuration time: Thu
Apr 7 18:43:03 UTC 2016, build type: Release, build system:
Linux-2.6.18-238.27.1.el5.hotfix.bz516490, C compiler: gcc 4.4.0, C++ compiler:
g++ 4.4.0
> Random Forest fails if features are INT or NUMERIC only and variable
> importance is TRUE
> ---------------------------------------------------------------------------------------
>
> Key: MADLIB-1087
> URL: https://issues.apache.org/jira/browse/MADLIB-1087
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Random Forest
> Reporter: Paul Chang
>
> If we attempt to train on a dataset where all features are either INT or
> NUMERIC, and with variable importance TRUE, forest_train() fails with the
> following error:
> [2017-04-03 13:35:35] [XX000] ERROR: plpy.SPIError: invalid array length
> (plpython.c:4648)
> [2017-04-03 13:35:35] Detail: array_of_bigint: Size should be in [1, 1e7], 0
> given
> [2017-04-03 13:35:35] Where: Traceback (most recent call last):
> [2017-04-03 13:35:35] PL/Python function "forest_train", line 42, in <module>
> [2017-04-03 13:35:35] sample_ratio
> [2017-04-03 13:35:35] PL/Python function "forest_train", line 591, in
> forest_train
> [2017-04-03 13:35:35] PL/Python function "forest_train", line 1038, in
> _calculate_oob_prediction
> [2017-04-03 13:35:35] PL/Python function "forest_train"
> However, if we add a single feature column that is FLOAT, REAL, or DOUBLE
> PRECISION, the trainer does not fail.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)