[
https://issues.apache.org/jira/browse/SPARK-15656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15307629#comment-15307629
]
Jieyuan Chen commented on SPARK-15656:
--------------------------------------
Thanks for the answer.I make a mistake that I think the parameter passed in is
the original random variable values like `kolmogorovSmirnovTest`, actually it
should be frequencies.
> ChiSqTest for goodness of fit doesn't test against a wrong uniform
> distribution by default
> ------------------------------------------------------------------------------------------
>
> Key: SPARK-15656
> URL: https://issues.apache.org/jira/browse/SPARK-15656
> Project: Spark
> Issue Type: Bug
> Components: MLlib
> Affects Versions: 1.5.1, 1.6.1
> Reporter: Jieyuan Chen
> Labels: easyfix, mllib, stats
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> I've been running a ChiSqTest to test whether my samples fit a uniform
> distribution.
> The documentation says that If a second vector to test against is not
> supplied as a parameter, the test runs against a uniform distribution. But
> when I pass samples drawn from a normal distribution, the p-value calculated
> is 1.0, which is wrong.
> The problem is that in ChiSqTest.scala, the `chiSquared` function will
> generate a wrong uniform distribution if the expected vector is not supplied.
> The default generated samples should be
> val expArr = if (expected.size == 0) Array.tabulate(size)(i => i.toDouble /
> size) else expected.toArray
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]