Re: pca_train error

Esther Vasiete Wed, 06 Apr 2016 15:55:09 -0700

Upgrading to MADlib 1.8 solved the problem!

Thanks,
Esther


On Tue, Apr 5, 2016 at 10:27 AM, Esther Vasiete <[email protected]> wrote:

> Oh sorry, it is HAWQ 1.3.1.
>
> And the data engineer will upgrade to MADlib 1.8 tonight.
>
> Thanks,
> Esther
>
> On Tue, Apr 5, 2016 at 9:26 AM, Frank McQuillan <[email protected]>
> wrote:
>
>> Please clarify the platform - do you mean GPDB 4.2.0?
>>
>> Would you be able to upgrade to MADlib 1.8?  Then you are using the
>> latest software and we can see if you still have a problem.
>>
>> Frank
>>
>> On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <[email protected]>
>> wrote:
>>
>>> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>>>
>>> Thanks.
>>>
>>> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <[email protected]>
>>> wrote:
>>>
>>>> Thanks for the question, Esther.  What version of MADlib are you using
>>>> and what database platform and version are you running on?
>>>>
>>>> It seems to be a MADlib version lower than 1.8 since the error message
>>>> you report is different in the 1.8 release.  (There was a bug fix in 1.8 to
>>>> allow user-specified column names in PCA.)
>>>>
>>>> Frank
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to use pca_train but I am running through this error:
>>>>>
>>>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>>>> Function "madlib.__matrix_densify_sfunc(double
>>>>> precision[],integer,integer,double precision)": invalid argument - col
>>>>> should be in the range of [0, col_dim)  (seg35 awsaiuirl1178:40003
>>>>> pid=104068) (plpython.c:4648)
>>>>> SQL state: XX000
>>>>> Context: Traceback (most recent call last):
>>>>>   PL/Python function "pca_train", line 23, in <module>
>>>>>     return pca.pca(**globals())
>>>>>   PL/Python function "pca_train", line 404, in pca
>>>>> PL/Python function "pca_train"
>>>>>
>>>>> My input table has 15472 rows and two columns; a row_id and an array
>>>>> with 853 features. I am calling pca_train like this:
>>>>>
>>>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>>>                                            'ev.hci_subset_pca_output',
>>>>>                                            'row_id',
>>>>>                                             3);
>>>>>
>>>>> I unfortunately cannot share the data but this is how it looks in
>>>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>>>> large and this is why it appears to be empty but it isn't as you can see 
>>>>> in
>>>>> the second screenshot.
>>>>>
>>>>> [image: Inline image 1]
>>>>>
>>>>> [image: Inline image 3]
>>>>>
>>>>> I am not sure why I am running through this error. Please advice.
>>>>>
>>>>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>>>>> with 1. Still getting the same error.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --
>>>>> *Esther Vasiete *
>>>>> *Data Scientist | Pivotal*
>>>>> [email protected]
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Esther Vasiete *
>>> *Data Scientist | Pivotal*
>>> [email protected]
>>>
>>
>>
>
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> [email protected]
>



-- 
*Esther Vasiete *
*Data Scientist | Pivotal*
[email protected]

Re: pca_train error

Reply via email to