[
https://issues.apache.org/jira/browse/MADLIB-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954153#comment-15954153
]
Frank McQuillan edited comment on MADLIB-1087 at 4/3/17 9:18 PM:
-----------------------------------------------------------------
I just tried it and see the same thing as you do. Should be cast internally to
DOUBLE PRECISION but seems it got missed.
Whoever takes this on, please check all random forest and decision tree
parameters to see that we are handling NUMERIC properly.
(Noticed you are on an older version of MADlib, Paul. The latest release is
1.10, though I don't recall any changes to RF in the last couple releases so
probably won't affect this issue that you are reporting.)
was (Author: fmcquillan):
We will take a look, Paul.
Noticed you are on an older version of MADlib. The latest release is 1.10,
though I don't recall any changes to RF in the last couple releases so probably
won't affect the issue that you are reporting.
> Random Forest fails if features are INT or NUMERIC only and variable
> importance is TRUE
> ---------------------------------------------------------------------------------------
>
> Key: MADLIB-1087
> URL: https://issues.apache.org/jira/browse/MADLIB-1087
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Random Forest
> Reporter: Paul Chang
> Priority: Minor
> Fix For: v1.11
>
>
> If we attempt to train on a dataset where all features are either INT or
> NUMERIC, and with variable importance TRUE, forest_train() fails with the
> following error:
> [2017-04-03 13:35:35] [XX000] ERROR: plpy.SPIError: invalid array length
> (plpython.c:4648)
> [2017-04-03 13:35:35] Detail: array_of_bigint: Size should be in [1, 1e7], 0
> given
> [2017-04-03 13:35:35] Where: Traceback (most recent call last):
> [2017-04-03 13:35:35] PL/Python function "forest_train", line 42, in <module>
> [2017-04-03 13:35:35] sample_ratio
> [2017-04-03 13:35:35] PL/Python function "forest_train", line 591, in
> forest_train
> [2017-04-03 13:35:35] PL/Python function "forest_train", line 1038, in
> _calculate_oob_prediction
> [2017-04-03 13:35:35] PL/Python function "forest_train"
> However, if we add a single feature column that is FLOAT, REAL, or DOUBLE
> PRECISION, the trainer does not fail.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)