[
https://issues.apache.org/jira/browse/SPARK-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joseph K. Bradley updated SPARK-6312:
-------------------------------------
Target Version/s: (was: 1.5.0)
> ChiSqTest should check for too few counts
> -----------------------------------------
>
> Key: SPARK-6312
> URL: https://issues.apache.org/jira/browse/SPARK-6312
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Affects Versions: 1.2.0
> Reporter: Joseph K. Bradley
> Priority: Minor
>
> ChiSqTest assumes that elements of the contingency matrix are large enough
> (have enough counts) s.t. the central limit theorem kicks in. It would be
> reasonable to do one or more of the following:
> * Add a note in the docs about making sure there are a reasonable number of
> instances being used (or counts in the contingency table entries, to be more
> precise and account for skewed category distributions).
> * Add a check in the code which could:
> ** Log a warning message
> ** Alter the p-value to make sure it indicates the test result is
> insignificant
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]