[
https://issues.apache.org/jira/browse/MADLIB-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461501#comment-16461501
]
Himanshu Pandey commented on MADLIB-1172:
-----------------------------------------
Hi [~fmcquillan] ,
I have tested this in 4.3.25 and below are the results.
Both Singular data and Separated datasets work fine and return an output but
the regular data-set load-data.sql
is returning an empty model table which is different from the initial issue.
{code:java}
[gpadmin@gpdb ~]$ psql -f load-data.sql
psql:load-data.sql:1: NOTICE: table "dummy_data" does not exist, skipping
DROP TABLE
psql:load-data.sql:2: NOTICE: Table doesn't have 'DISTRIBUTED BY' clause --
Using column named 'id' as the Greenplum Database data distribution key for
this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make
sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
[gpadmin@gpdb ~]$ psql
psql (8.2.15)
Type "help" for help.
gpadmin=# \dt
List of relations
Schema | Name | Type | Owner | Storage
--------+------------+-------+---------+---------
public | dummy_data | table | gpadmin | heap
(1 row)
gpadmin=# select
madlib.logregr_train('dummy_data','dummy_logit_gp','y','ARRAY[1,x1,x2,x3,x4,x5]',NULL,20,'irls');
logregr_train
---------------
(1 row)
gpadmin=# select * from dummy_logit_gp;
coef | log_likelihood | std_err | z_stats | p_values | odds_ratios |
condition_no | num_rows_processed | num_missing_rows_skipped | num_iterations |
variance_covariance
------+----------------+---------+---------+----------+-------------+--------------+--------------------+--------------------------+----------------+---------------------
| | | | | | | | | 4 |
(1 row)
gpadmin=#
{code}
> Logistic regression produces empty output table but no error message on
> Greenplum
> ---------------------------------------------------------------------------------
>
> Key: MADLIB-1172
> URL: https://issues.apache.org/jira/browse/MADLIB-1172
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Logistic Regression
> Reporter: Frank McQuillan
> Assignee: Himanshu Pandey
> Priority: Minor
> Fix For: v1.15
>
> Attachments: Logistic-regression-empty-output.ipynb,
> load-data-sep.sql, load-data-singular.sql, load-data.sql
>
>
> Separated and singular data sets may produce and empty model table on
> Greenplum 4.3.x. On Postgres 9.6 the same example works OK.
> See the attache jupyter notebook and data sets for details.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)