[
https://issues.apache.org/jira/browse/MADLIB-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159924#comment-15159924
]
Frank McQuillan commented on MADLIB-895:
----------------------------------------
The SQL does have a couple mistakes in it, but the result in the current docs
is OK. The docs also need some clarification.
The Chi-squared independence test actually uses the Chi-squared goodness-of-fit
function,
as shown in the example below. The expected value needs to be computed in the
SQL and passed
to the goodness-of-fit function. The expected value formula for MADlib is
computed as
<em>sum of rows * sum of columns</em>, for each element of the input matrix.
For e.g., expected value for
element (2,1) would be <em>sum of row 2 * sum of column 1</em>.
> Incorrect examples in hypothesis tests documentation
> ----------------------------------------------------
>
> Key: MADLIB-895
> URL: https://issues.apache.org/jira/browse/MADLIB-895
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Inferential Statistics
> Reporter: Rahul Iyer
> Assignee: Frank McQuillan
> Priority: Minor
> Fix For: v1.9
>
>
> The SQL and results for the example for Chi-2 tests is wrong. The
> documentation shows the result as
> {code}
> statistic | p_value | df | phi |
> contingency_coef
>
> ------------------+----------------------+----+------------------+-------------------
> 138.289841626008 | 2.32528678709871e-25 | 9 | 2.93991753313346 |
> 0.946730727519112
> {code}
> whereas it should be,
> {code}
> statistic | p_value | df | phi |
> contingency_coef
> ------------------+----------------------+----+-----------------+-------------------
> 320.125868955635 | 1.39464882809491e-63 | 9 | 4.4730154045931 |
> 0.975909209031126
> (1 row)
> {code}
> The SQL also has a couple of errors and does not run as is.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)