[ 
https://issues.apache.org/jira/browse/MADLIB-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-896:
-----------------------------------
    Labels: starter  (was: gsoc2016 starter)

> PivotalR test failures indicate potential bugs in MADlib GLM
> ------------------------------------------------------------
>
>                 Key: MADLIB-896
>                 URL: https://issues.apache.org/jira/browse/MADLIB-896
>             Project: Apache MADlib
>          Issue Type: Bug
>            Reporter: Xixuan (Aaron) Feng
>            Assignee: Rahul Iyer
>              Labels: starter
>
> These problems may be just numerical issues with too large the condition 
> numbers or too small of a training set. To be investigated.
> {code}
> > PivotalR:::test(filter="glm")
> Running tests -------------------------
> Test cases for madlib.glm and its helper functions : 
> .port? 5431
> .dbname? madlib-pg93
> ....................
> WARNING:  GLM warning: the computation did not converge in 20 iterations!
> CONTEXT:  PL/Python function "glm"
> 1.2.................
> WARNING:  GLM warning: the computation did not converge in 20 iterations!
> CONTEXT:  PL/Python function "glm"
> ....
> WARNING:  Hessian or gradient is not finite.
> CONTEXT:  SQL statement "
>             SELECT
>                 __madlib_temp_75741577_1437438071_4375895__ AS 
> __madlib_temp_75741577_1437438071_4375895__,
>                 sex ,
>                 4 AS __madlib_temp_48749745_1437438071_22969480__,
>                 (
>                 madlib.__glm_binomial_probit_agg(
>                     ((("rings") < (10))::integer)::double precision,
>                     
> (array[1,"length","diameter","height","whole","shucked","viscera","shell"])::double
>  precision[],
>                     
> __madlib_temp_43345218_1437438071_11277539__.__madlib_temp_69537766_1437438071_32811656__)
>                 ) AS __madlib_temp_69537766_1437438071_32811656__
>             FROM
>             (
>                 SELECT
>                     *,
>                     array_to_string(ARRAY[sex::text],
>                                     ','
>                                    ) AS 
> __madlib_temp_75741577_1437438071_4375895__
>                 FROM 
> "pg_temp_3"."madlib_temp_d763e98a_0753_969a95_03cedf5694ab"
>             ) AS _src
>             JOIN
>             (
>                 SELECT
>                     unnest($1) AS __madlib_temp_75741577_1437438071_4375895__,
>                     unnest($2) AS __madlib_temp_69537766_1437438071_32811656__
>             ) AS __madlib_temp_43345218_1437438071_11277539__
>             USING (__madlib_temp_75741577_1437438071_4375895__)
>             GROUP BY sex, __madlib_temp_75741577_1437438071_4375895__
>             "
> PL/Python function "glm"
> WARNING:  Hessian or gradient is not finite.
> CONTEXT:  SQL statement "
>             SELECT
>                 __madlib_temp_75741577_1437438071_4375895__ AS 
> __madlib_temp_75741577_1437438071_4375895__,
>                 sex ,
>                 5 AS __madlib_temp_48749745_1437438071_22969480__,
>                 (
>                 madlib.__glm_binomial_probit_agg(
>                     ((("rings") < (10))::integer)::double precision,
>                     
> (array[1,"length","diameter","height","whole","shucked","viscera","shell"])::double
>  precision[],
>                     
> __madlib_temp_43345218_1437438071_11277539__.__madlib_temp_69537766_1437438071_32811656__)
>                 ) AS __madlib_temp_69537766_1437438071_32811656__
>             FROM
>             (
>                 SELECT
>                     *,
>                     array_to_string(ARRAY[sex::text],
>                                     ','
>                                    ) AS 
> __madlib_temp_75741577_1437438071_4375895__
>                 FROM 
> "pg_temp_3"."madlib_temp_d763e98a_0753_969a95_03cedf5694ab"
>             ) AS _src
>             JOIN
>             (
>                 SELECT
>                     unnest($1) AS __madlib_temp_75741577_1437438071_4375895__,
>                     unnest($2) AS __madlib_temp_69537766_1437438071_32811656__
>             ) AS __madlib_temp_43345218_1437438071_11277539__
>             USING (__madlib_temp_75741577_1437438071_4375895__)
>             GROUP BY sex, __madlib_temp_75741577_1437438071_4375895__
>             "
> PL/Python function "glm"
> 34..............5..........................
> 1. Failure (at test-madlib_glm.r#78): Test gaussian(inverse) 
> ------------------------------------------
> fit.db$coef not equal to fit.r$coefficients[, 1]
> 8/8 mismatches (average diff: 0.00719).
> First 8:
>  pos       x       y     diff
>    1  0.1970  0.1990 -0.00196
>    2 -0.0243 -0.0254  0.00112
>    3 -0.1709 -0.1630 -0.00793
>    4 -0.2059 -0.2462  0.04027
>    5 -0.0476 -0.0465 -0.00112
>    6  0.1413  0.1397  0.00156
>    7  0.0564  0.0577 -0.00130
>    8 -0.0146 -0.0123 -0.00222
> 2. Failure (at test-madlib_glm.r#86): Test gaussian(inverse) with categorical 
> features ----------------
> fit.db$coef not equal to fit.r$coefficients[, 1]
> 10/10 mismatches (average diff: 0.00517).
> First 10:
>  pos        x        y      diff
>    1  0.18215  0.18410 -1.94e-03
>    2  0.01223  0.01214  8.72e-05
>    3 -0.00158 -0.00153 -4.83e-05
>    4 -0.02981 -0.03107  1.26e-03
>    5 -0.13631 -0.12955 -6.76e-03
>    6 -0.19904 -0.23515  3.61e-02
>    7 -0.04775 -0.04668 -1.07e-03
>    8  0.14030  0.13905  1.26e-03
>    9  0.06185  0.06311 -1.26e-03
>   10 -0.01741 -0.01550 -1.91e-03
> 3. Failure (at test-madlib_glm.r#154): Test binomial(probit) with grouping 
> ----------------------------
> fit.db[[1]]$coef not equal to fit.r[[1]]$coefficients[, 1]
> 8/8 mismatches (average diff: 3.43).
> First 8:
>  pos      x      y   diff
>    1   2.79   1.73  1.063
>    2   5.41   5.73 -0.317
>    3  -3.23  -1.48 -1.742
>    4 -12.52  -9.37 -3.157
>    5 -16.51 -11.62 -4.893
>    6  21.90  16.00  5.899
>    7  13.38   7.96  5.423
>    8   2.33  -2.62  4.957
> 4. Failure (at test-madlib_glm.r#155): Test binomial(probit) with grouping 
> ----------------------------
> fit.db[[1]]$std_err not equal to fit.r[[1]]$coefficients[, 2]
> 8/8 mismatches (average diff: Inf).
> First 8:
>  pos     x   y diff
>    1 0.582 Inf -Inf
>    2 2.559 Inf -Inf
>    3 3.334 Inf -Inf
>    4 4.176 Inf -Inf
>    5 2.934 Inf -Inf
>    6 3.257 Inf -Inf
>    7 3.928 Inf -Inf
>    8 3.629 Inf -Inf
> 5. Failure (at test-madlib_glm.r#214): Test poisson(identity) with grouping 
> ---------------------------
> fit.db[[1]]$coef not equal to fit.r[[1]]$coefficients[, 1]
> 8/8 mismatches (average diff: 0.13).
> First 8:
>  pos     x     y     diff
>    1  2.74  2.75 -0.00483
>    2 -1.76 -1.78  0.02177
>    3  5.83  5.81  0.02412
>    4 27.36 27.45 -0.08863
>    5  2.67  2.44  0.22605
>    6 -7.71 -7.38 -0.32432
>    7 -5.89 -5.72 -0.16966
>    8 14.88 15.06 -0.17732
> Error: Test failures
> In addition: Warning messages:
> 1: glm.fit: algorithm did not converge 
> 2: glm.fit: algorithm did not converge 
> 3: glm.fit: algorithm did not converge 
> 4: glm.fit: fitted probabilities numerically 0 or 1 occurred 
> 5: glm.fit: algorithm did not converge 
> 6: glm.fit: fitted probabilities numerically 0 or 1 occurred 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to