Upgrading to MADlib 1.8 solved the problem! Thanks, Esther
On Tue, Apr 5, 2016 at 10:27 AM, Esther Vasiete <[email protected]> wrote: > Oh sorry, it is HAWQ 1.3.1. > > And the data engineer will upgrade to MADlib 1.8 tonight. > > Thanks, > Esther > > On Tue, Apr 5, 2016 at 9:26 AM, Frank McQuillan <[email protected]> > wrote: > >> Please clarify the platform - do you mean GPDB 4.2.0? >> >> Would you be able to upgrade to MADlib 1.8? Then you are using the >> latest software and we can see if you still have a problem. >> >> Frank >> >> On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <[email protected]> >> wrote: >> >>> I am using MADlib 1.7.1 on HAWQ 4.2.0. >>> >>> Thanks. >>> >>> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <[email protected]> >>> wrote: >>> >>>> Thanks for the question, Esther. What version of MADlib are you using >>>> and what database platform and version are you running on? >>>> >>>> It seems to be a MADlib version lower than 1.8 since the error message >>>> you report is different in the 1.8 release. (There was a bug fix in 1.8 to >>>> allow user-specified column names in PCA.) >>>> >>>> Frank >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <[email protected]> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am trying to use pca_train but I am running through this error: >>>>> >>>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError: >>>>> Function "madlib.__matrix_densify_sfunc(double >>>>> precision[],integer,integer,double precision)": invalid argument - col >>>>> should be in the range of [0, col_dim) (seg35 awsaiuirl1178:40003 >>>>> pid=104068) (plpython.c:4648) >>>>> SQL state: XX000 >>>>> Context: Traceback (most recent call last): >>>>> PL/Python function "pca_train", line 23, in <module> >>>>> return pca.pca(**globals()) >>>>> PL/Python function "pca_train", line 404, in pca >>>>> PL/Python function "pca_train" >>>>> >>>>> My input table has 15472 rows and two columns; a row_id and an array >>>>> with 853 features. I am calling pca_train like this: >>>>> >>>>> DROP TABLE if exists ev.hci_subset_pca_output; >>>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input', >>>>> 'ev.hci_subset_pca_output', >>>>> 'row_id', >>>>> 3); >>>>> >>>>> I unfortunately cannot share the data but this is how it looks in >>>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too >>>>> large and this is why it appears to be empty but it isn't as you can see >>>>> in >>>>> the second screenshot. >>>>> >>>>> [image: Inline image 1] >>>>> >>>>> [image: Inline image 3] >>>>> >>>>> I am not sure why I am running through this error. Please advice. >>>>> >>>>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts >>>>> with 1. Still getting the same error. >>>>> >>>>> Thanks, >>>>> >>>>> -- >>>>> *Esther Vasiete * >>>>> *Data Scientist | Pivotal* >>>>> [email protected] >>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> *Esther Vasiete * >>> *Data Scientist | Pivotal* >>> [email protected] >>> >> >> > > > -- > *Esther Vasiete * > *Data Scientist | Pivotal* > [email protected] > -- *Esther Vasiete * *Data Scientist | Pivotal* [email protected]
