Afra Ahmad <[email protected]> & Jimmy Skuros <[email protected]> & Matt Gitelis <[email protected]>
sent me a question about Chi squared independence test question which I cut and pasted below, along with my response: "...regarding an issue we encountered with the "*Chi-squared independence test*" in Madlib. We are huge fans of Madlib but are having trouble implementing this one test. Can you please confirm that the documentation below is correct (from most recent docs here: http://doc.madlib.net/latest/group__grp__stats__tests.html)? Also, what are we supposed to do to calculate the expected values? Any pointers would be greatly appreciated! Thanks, Matt" >From Frank: "The MADlib software is correct, but just the docs are wrong. I already fixed them and made a pull request. The JIRA is https://issues.apache.org/jira/browse/MADLIB-895 The correct query for chi square independence test is attached How to calculate expected value: The Chi-squared independence test actually uses the Chi-squared goodness-of-fit function. The expected value needs to be computed in the SQL and passed to the goodness-of-fit function. The expected value formula for MADlib is computed as sum of rows * sum of columns, for each element of the input matrix. For e.g., expected value for element (2,1) would be sum of row 2 * sum of column 1."
