Re: pca_train error

Esther Vasiete Tue, 05 Apr 2016 10:28:35 -0700

Oh sorry, it is HAWQ 1.3.1.

And the data engineer will upgrade to MADlib 1.8 tonight.


Thanks,
Esther

On Tue, Apr 5, 2016 at 9:26 AM, Frank McQuillan <[email protected]>
wrote:

> Please clarify the platform - do you mean GPDB 4.2.0?
>
> Would you be able to upgrade to MADlib 1.8?  Then you are using the latest
> software and we can see if you still have a problem.
>
> Frank
>
> On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <[email protected]>
> wrote:
>
>> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>>
>> Thanks.
>>
>> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <[email protected]>
>> wrote:
>>
>>> Thanks for the question, Esther.  What version of MADlib are you using
>>> and what database platform and version are you running on?
>>>
>>> It seems to be a MADlib version lower than 1.8 since the error message
>>> you report is different in the 1.8 release.  (There was a bug fix in 1.8 to
>>> allow user-specified column names in PCA.)
>>>
>>> Frank
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to use pca_train but I am running through this error:
>>>>
>>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>>> Function "madlib.__matrix_densify_sfunc(double
>>>> precision[],integer,integer,double precision)": invalid argument - col
>>>> should be in the range of [0, col_dim)  (seg35 awsaiuirl1178:40003
>>>> pid=104068) (plpython.c:4648)
>>>> SQL state: XX000
>>>> Context: Traceback (most recent call last):
>>>>   PL/Python function "pca_train", line 23, in <module>
>>>>     return pca.pca(**globals())
>>>>   PL/Python function "pca_train", line 404, in pca
>>>> PL/Python function "pca_train"
>>>>
>>>> My input table has 15472 rows and two columns; a row_id and an array
>>>> with 853 features. I am calling pca_train like this:
>>>>
>>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>>                                            'ev.hci_subset_pca_output',
>>>>                                            'row_id',
>>>>                                             3);
>>>>
>>>> I unfortunately cannot share the data but this is how it looks in
>>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>>> large and this is why it appears to be empty but it isn't as you can see in
>>>> the second screenshot.
>>>>
>>>> [image: Inline image 1]
>>>>
>>>> [image: Inline image 3]
>>>>
>>>> I am not sure why I am running through this error. Please advice.
>>>>
>>>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>>>> with 1. Still getting the same error.
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> *Esther Vasiete *
>>>> *Data Scientist | Pivotal*
>>>> [email protected]
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> *Esther Vasiete *
>> *Data Scientist | Pivotal*
>> [email protected]
>>
>
>


-- 
*Esther Vasiete *
*Data Scientist | Pivotal*
[email protected]

Re: pca_train error

Reply via email to