[ 
https://issues.apache.org/jira/browse/MADLIB-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736505#comment-16736505
 ] 

Himanshu Pandey commented on MADLIB-1284:
-----------------------------------------

[~fmcquillan], [~nikhilkak]

Current Status:

1. Special Characters support: *$ ' ,*

Special character support for the regular datatypes is implemented. A special 
case is of *comma(,)* where if a column name has a comma in it, it needs to be 
passed in function in double quotes. For eg: "empl,oyee"

For eg:
{code:java}
select madlib.linregr_train('houses_spcl', 'result_lin_houses_spcl', 
'"pr''ice"',
                    'array[1, "ta,x", "ba$th", size]',
                    '"bed,room","ta,x"', True);
{code}
In Case of JSON, Above characters are supported. However, if a comma is present 
in a grouping column, it cannot be passed to the function with double quotes. 
Something like this will not work in case of Json :
 *data->>"ta,x"*

So using a comma in grouping columns still not working and extra work needs to 
be done if we have to support that in grouping columns.

 

In case of JSON, this will work:
{code:java}
select linregr_train('houses_json', 'result_lin_houses_json', 
'(data->>''pr''''ice'')::integer',
                    'array[1, (data->>''ta,x'')::integer, 
(data->>''ba$th'')::double precision, (data->>''size'')::integer]',
                    'data->>''bedroom'',data->>''lot''', True);
{code}
However, this will not:
{code:java}
select linregr_train('houses_json', 'result_lin_houses_json', 
'(data->>''pr''''ice'')::integer',
                    'array[1, (data->>''ta,x'')::integer, 
(data->>''ba$th'')::double precision, (data->>''size'')::integer]',
                    'data->>''ta,x''', True);
{code}
2. *JOIN* instead of *USING*

Expressions are not supported in USING clause. So This is replaced with regular 
JOIN. This is done mainly to support JSON expressions.

 

 

Thanks,
Himanshu

> linregr_train fails when dependent variable is a JSONB element
> --------------------------------------------------------------
>
>                 Key: MADLIB-1284
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1284
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Linear Regression
>            Reporter: Nandish Jayaram
>            Assignee: Himanshu Pandey
>            Priority: Minor
>             Fix For: v1.16
>
>
> An issue reported in the user mailing list 
> (https://lists.apache.org/thread.html/ab645438d4ab6ab3508f3e7c790d2fc65fe845031bd481aa0bdff5f1@%3Cuser.madlib.apache.org%3E):
> I have a table that contains a JSONB field (Postgres 10.x) and am now looking 
> to analyze all that rich data with MADLib.  Example query:
> {quote}SELECT madlib.linregr_train (
>   'regr_example',         -- source table
>   'regr_example_model',   -- output model table
>   '(data->>''y'')::int',     -- dependent variable
>   'ARRAY[1, (data->>''x1'')::int, (data->>''x2'')::int]'      -- independent 
> variables
> );{quote}
> However, it looks like MADLib isn't liking using these fields when it comes 
> to creating the temporary table:
> {quote}ERROR:  spiexceptions.SyntaxError: syntax error at or near "')::int'"
> LINE 7:                     , '(data->>'y')::int'::varchar      as d...
>                                          ^
> QUERY:  
>             create table regr_example_model_summary as
>                 select
>                       'linregr'::varchar                  as method
>                     , 'regr_example'::varchar           as source_table
>                     , 'regr_example_model'::varchar              as out_table
>                     , '(data->>'y')::int'::varchar      as dependent_varname
>                     , 'ARRAY[1, (data->>'x1')::int, 
> (data->>'x2')::int]'::varchar    as independent_varname
>                     , 0::integer       as num_rows_processed
>                     , 4::integer         as num_missing_rows_skipped
>                     , NULL::text                as grouping_col
>            
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "linregr_train", line 20, in <module>
>     return linear.linregr_train(**globals())
>   PL/Python function "linregr_train", line 146, in linregr_train
> PL/Python function "linregr_train"{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to