Thanks for the question, Esther.  What version of MADlib are you using and
what database platform and version are you running on?

It seems to be a MADlib version lower than 1.8 since the error message you
report is different in the 1.8 release.  (There was a bug fix in 1.8 to allow
user-specified column names in PCA.)

Frank





On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <[email protected]> wrote:

> Hi,
>
> I am trying to use pca_train but I am running through this error:
>
> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
> Function "madlib.__matrix_densify_sfunc(double
> precision[],integer,integer,double precision)": invalid argument - col
> should be in the range of [0, col_dim)  (seg35 awsaiuirl1178:40003
> pid=104068) (plpython.c:4648)
> SQL state: XX000
> Context: Traceback (most recent call last):
>   PL/Python function "pca_train", line 23, in <module>
>     return pca.pca(**globals())
>   PL/Python function "pca_train", line 404, in pca
> PL/Python function "pca_train"
>
> My input table has 15472 rows and two columns; a row_id and an array with
> 853 features. I am calling pca_train like this:
>
> DROP TABLE if exists ev.hci_subset_pca_output;
> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>                                            'ev.hci_subset_pca_output',
>                                            'row_id',
>                                             3);
>
> I unfortunately cannot share the data but this is how it looks in
> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
> large and this is why it appears to be empty but it isn't as you can see in
> the second screenshot.
>
> [image: Inline image 1]
>
> [image: Inline image 3]
>
> I am not sure why I am running through this error. Please advice.
>
> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
> with 1. Still getting the same error.
>
> Thanks,
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> [email protected]
>
>
>

Reply via email to