Thanks for the question, Esther. What version of MADlib are you using and what database platform and version are you running on?
It seems to be a MADlib version lower than 1.8 since the error message you report is different in the 1.8 release. (There was a bug fix in 1.8 to allow user-specified column names in PCA.) Frank On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <[email protected]> wrote: > Hi, > > I am trying to use pca_train but I am running through this error: > > ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError: > Function "madlib.__matrix_densify_sfunc(double > precision[],integer,integer,double precision)": invalid argument - col > should be in the range of [0, col_dim) (seg35 awsaiuirl1178:40003 > pid=104068) (plpython.c:4648) > SQL state: XX000 > Context: Traceback (most recent call last): > PL/Python function "pca_train", line 23, in <module> > return pca.pca(**globals()) > PL/Python function "pca_train", line 404, in pca > PL/Python function "pca_train" > > My input table has 15472 rows and two columns; a row_id and an array with > 853 features. I am calling pca_train like this: > > DROP TABLE if exists ev.hci_subset_pca_output; > SELECT madlib.pca_train( 'ev.hci_subset_pca_input', > 'ev.hci_subset_pca_output', > 'row_id', > 3); > > I unfortunately cannot share the data but this is how it looks in > pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too > large and this is why it appears to be empty but it isn't as you can see in > the second screenshot. > > [image: Inline image 1] > > [image: Inline image 3] > > I am not sure why I am running through this error. Please advice. > > Update: I have renamed feature_vector to "row_vec" and "row_id" starts > with 1. Still getting the same error. > > Thanks, > > -- > *Esther Vasiete * > *Data Scientist | Pivotal* > [email protected] > > >
