[
https://issues.apache.org/jira/browse/MADLIB-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16399697#comment-16399697
]
ASF GitHub Bot commented on MADLIB-1215:
----------------------------------------
GitHub user iyerr3 opened a pull request:
https://github.com/apache/madlib/pull/242
PCA: Fix issue with text grouping col input
JIRA: MADLIB-1215
PCA fails when the grouping column is a text column (a common use case).
This is because the column is compared to its values in a where
clause with the value not quoted. This commit adds single quotes around
the value.
Other changes include whitespace cleanup and PEP8 conforming changes.
Closes #242
Note to reviewers: It would help to see the diff without the whitespace
changes.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/madlib/madlib bugfix/pca_grouping_text
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/madlib/pull/242.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #242
----
commit cf4ce67fac9309281d5215c264904f05bd7d93bb
Author: Rahul Iyer <riyer@...>
Date: 2018-03-15T00:54:22Z
PCA: Fix issue with text grouping col input
JIRA: MADLIB-1215
PCA fails when the grouping column is a text column (a common use case).
This is because the column is compared to its values in a where
clause with the value not quoted. This commit adds single quotes around
the value.
Other changes include whitespace cleanup and PEP8 conforming changes.
Closes #242
----
> PCA error with text grouping column
> -----------------------------------
>
> Key: MADLIB-1215
> URL: https://issues.apache.org/jira/browse/MADLIB-1215
> Project: Apache MADlib
> Issue Type: Bug
> Reporter: Rashmi Raghu
> Priority: Minor
> Fix For: v1.14
>
>
> {{The issue is that PCA train does not run when the grouping column is text
> (have not tested other non-integer data types). See below for error
> reproduced on a modified example from the docs.}}
> DROP TABLE IF EXISTS mat_group_text;
> CREATE TABLE mat_group_text (
> id integer,
> row_vec double precision[],
> matrix_id_text text
> );
> INSERT INTO mat_group_text VALUES
> (1, '\{1,2,3}', '1'),
> (2, '\{2,1,2}', '1'),
> (3, '\{3,2,1}', '1'),
> (4, '\{1,2,3,4,5}', '2'),
> (5, '\{2,5,2,4,1}', '2'),
> (6, '\{5,4,3,2,1}', '2');
> DROP TABLE IF EXISTS result_table_group_text, result_table_group_text_mean;
> SELECT madlib.pca_train('mat_group_text', -- Source table
> 'result_table_group_text', -- Output table
> 'id', -- Row id of source table
> 0.8, -- Proportion of variance
> 'matrix_id_text'); -- Grouping column
> SELECT * FROM result_table_group_text ORDER BY matrix_id_text, row_id_text;
> -- NOTICE: table "result_table_group_text" does not exist, skipping
> -- NOTICE: table "result_table_group_text_mean" does not exist, skipping
> -- ERROR: plpy.SPIError: plpy.SPIError: operator does not exist: text =
> integer
> -- LINE 5: WHERE matrix_id_text=1
> -- ^
> -- HINT: No operator matches the given name and argument type(s). You might
> need to add explicit type casts.
> -- QUERY:
> -- CREATE TABLE pg_temp.__madlib_temp_57228654_1520981521_47712361__group_0 AS
> -- SELECT ROW_NUMBER() OVER() AS row_id, row_vec
> -- FROM mat_group_text
> -- WHERE matrix_id_text=1
> --
> -- CONTEXT: Traceback (most recent call last):
> -- PL/Python function "pca_train", line 23, in <module>
> -- return pca.pca(**globals())
> -- PL/Python function "pca_train", line 87, in pca
> -- PL/Python function "pca_train", line 235, in pca_wrap
> -- PL/Python function "pca_train"
> -- ********** Error **********
> --
> -- ERROR: plpy.SPIError: plpy.SPIError: operator does not exist: text =
> integer
> -- SQL state: 42883
> -- Hint: No operator matches the given name and argument type(s). You might
> need to add explicit type casts.
> -- Context: Traceback (most recent call last):
> -- PL/Python function "pca_train", line 23, in <module>
> -- return pca.pca(**globals())
> -- PL/Python function "pca_train", line 87, in pca
> -- PL/Python function "pca_train", line 235, in pca_wrap
> -- PL/Python function "pca_train"
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)