[
https://issues.apache.org/jira/browse/MADLIB-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank McQuillan updated MADLIB-896:
-----------------------------------
Labels: starter (was: gsoc2016 starter)
> PivotalR test failures indicate potential bugs in MADlib GLM
> ------------------------------------------------------------
>
> Key: MADLIB-896
> URL: https://issues.apache.org/jira/browse/MADLIB-896
> Project: Apache MADlib
> Issue Type: Bug
> Reporter: Xixuan (Aaron) Feng
> Assignee: Rahul Iyer
> Labels: starter
>
> These problems may be just numerical issues with too large the condition
> numbers or too small of a training set. To be investigated.
> {code}
> > PivotalR:::test(filter="glm")
> Running tests -------------------------
> Test cases for madlib.glm and its helper functions :
> .port? 5431
> .dbname? madlib-pg93
> ....................
> WARNING: GLM warning: the computation did not converge in 20 iterations!
> CONTEXT: PL/Python function "glm"
> 1.2.................
> WARNING: GLM warning: the computation did not converge in 20 iterations!
> CONTEXT: PL/Python function "glm"
> ....
> WARNING: Hessian or gradient is not finite.
> CONTEXT: SQL statement "
> SELECT
> __madlib_temp_75741577_1437438071_4375895__ AS
> __madlib_temp_75741577_1437438071_4375895__,
> sex ,
> 4 AS __madlib_temp_48749745_1437438071_22969480__,
> (
> madlib.__glm_binomial_probit_agg(
> ((("rings") < (10))::integer)::double precision,
>
> (array[1,"length","diameter","height","whole","shucked","viscera","shell"])::double
> precision[],
>
> __madlib_temp_43345218_1437438071_11277539__.__madlib_temp_69537766_1437438071_32811656__)
> ) AS __madlib_temp_69537766_1437438071_32811656__
> FROM
> (
> SELECT
> *,
> array_to_string(ARRAY[sex::text],
> ','
> ) AS
> __madlib_temp_75741577_1437438071_4375895__
> FROM
> "pg_temp_3"."madlib_temp_d763e98a_0753_969a95_03cedf5694ab"
> ) AS _src
> JOIN
> (
> SELECT
> unnest($1) AS __madlib_temp_75741577_1437438071_4375895__,
> unnest($2) AS __madlib_temp_69537766_1437438071_32811656__
> ) AS __madlib_temp_43345218_1437438071_11277539__
> USING (__madlib_temp_75741577_1437438071_4375895__)
> GROUP BY sex, __madlib_temp_75741577_1437438071_4375895__
> "
> PL/Python function "glm"
> WARNING: Hessian or gradient is not finite.
> CONTEXT: SQL statement "
> SELECT
> __madlib_temp_75741577_1437438071_4375895__ AS
> __madlib_temp_75741577_1437438071_4375895__,
> sex ,
> 5 AS __madlib_temp_48749745_1437438071_22969480__,
> (
> madlib.__glm_binomial_probit_agg(
> ((("rings") < (10))::integer)::double precision,
>
> (array[1,"length","diameter","height","whole","shucked","viscera","shell"])::double
> precision[],
>
> __madlib_temp_43345218_1437438071_11277539__.__madlib_temp_69537766_1437438071_32811656__)
> ) AS __madlib_temp_69537766_1437438071_32811656__
> FROM
> (
> SELECT
> *,
> array_to_string(ARRAY[sex::text],
> ','
> ) AS
> __madlib_temp_75741577_1437438071_4375895__
> FROM
> "pg_temp_3"."madlib_temp_d763e98a_0753_969a95_03cedf5694ab"
> ) AS _src
> JOIN
> (
> SELECT
> unnest($1) AS __madlib_temp_75741577_1437438071_4375895__,
> unnest($2) AS __madlib_temp_69537766_1437438071_32811656__
> ) AS __madlib_temp_43345218_1437438071_11277539__
> USING (__madlib_temp_75741577_1437438071_4375895__)
> GROUP BY sex, __madlib_temp_75741577_1437438071_4375895__
> "
> PL/Python function "glm"
> 34..............5..........................
> 1. Failure (at test-madlib_glm.r#78): Test gaussian(inverse)
> ------------------------------------------
> fit.db$coef not equal to fit.r$coefficients[, 1]
> 8/8 mismatches (average diff: 0.00719).
> First 8:
> pos x y diff
> 1 0.1970 0.1990 -0.00196
> 2 -0.0243 -0.0254 0.00112
> 3 -0.1709 -0.1630 -0.00793
> 4 -0.2059 -0.2462 0.04027
> 5 -0.0476 -0.0465 -0.00112
> 6 0.1413 0.1397 0.00156
> 7 0.0564 0.0577 -0.00130
> 8 -0.0146 -0.0123 -0.00222
> 2. Failure (at test-madlib_glm.r#86): Test gaussian(inverse) with categorical
> features ----------------
> fit.db$coef not equal to fit.r$coefficients[, 1]
> 10/10 mismatches (average diff: 0.00517).
> First 10:
> pos x y diff
> 1 0.18215 0.18410 -1.94e-03
> 2 0.01223 0.01214 8.72e-05
> 3 -0.00158 -0.00153 -4.83e-05
> 4 -0.02981 -0.03107 1.26e-03
> 5 -0.13631 -0.12955 -6.76e-03
> 6 -0.19904 -0.23515 3.61e-02
> 7 -0.04775 -0.04668 -1.07e-03
> 8 0.14030 0.13905 1.26e-03
> 9 0.06185 0.06311 -1.26e-03
> 10 -0.01741 -0.01550 -1.91e-03
> 3. Failure (at test-madlib_glm.r#154): Test binomial(probit) with grouping
> ----------------------------
> fit.db[[1]]$coef not equal to fit.r[[1]]$coefficients[, 1]
> 8/8 mismatches (average diff: 3.43).
> First 8:
> pos x y diff
> 1 2.79 1.73 1.063
> 2 5.41 5.73 -0.317
> 3 -3.23 -1.48 -1.742
> 4 -12.52 -9.37 -3.157
> 5 -16.51 -11.62 -4.893
> 6 21.90 16.00 5.899
> 7 13.38 7.96 5.423
> 8 2.33 -2.62 4.957
> 4. Failure (at test-madlib_glm.r#155): Test binomial(probit) with grouping
> ----------------------------
> fit.db[[1]]$std_err not equal to fit.r[[1]]$coefficients[, 2]
> 8/8 mismatches (average diff: Inf).
> First 8:
> pos x y diff
> 1 0.582 Inf -Inf
> 2 2.559 Inf -Inf
> 3 3.334 Inf -Inf
> 4 4.176 Inf -Inf
> 5 2.934 Inf -Inf
> 6 3.257 Inf -Inf
> 7 3.928 Inf -Inf
> 8 3.629 Inf -Inf
> 5. Failure (at test-madlib_glm.r#214): Test poisson(identity) with grouping
> ---------------------------
> fit.db[[1]]$coef not equal to fit.r[[1]]$coefficients[, 1]
> 8/8 mismatches (average diff: 0.13).
> First 8:
> pos x y diff
> 1 2.74 2.75 -0.00483
> 2 -1.76 -1.78 0.02177
> 3 5.83 5.81 0.02412
> 4 27.36 27.45 -0.08863
> 5 2.67 2.44 0.22605
> 6 -7.71 -7.38 -0.32432
> 7 -5.89 -5.72 -0.16966
> 8 14.88 15.06 -0.17732
> Error: Test failures
> In addition: Warning messages:
> 1: glm.fit: algorithm did not converge
> 2: glm.fit: algorithm did not converge
> 3: glm.fit: algorithm did not converge
> 4: glm.fit: fitted probabilities numerically 0 or 1 occurred
> 5: glm.fit: algorithm did not converge
> 6: glm.fit: fitted probabilities numerically 0 or 1 occurred
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)