[ https://issues.apache.org/jira/browse/MATH-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091236#comment-15091236 ]
Phil Steitz commented on MATH-1313: ----------------------------------- The current test is definitely weak and the proposal above is better, but it could be made better still by looking not just at the mean, but the distribution itself. I suspect the test was implemented before we had ChiSquare or Kolmogorov-Smirnov tests available, either of which could be used to evaluate conformity of the distribution. The most direct would be to do a one-sample KS test with a Uniform distribution instance as the RealDistribution. If the p-value returned is small, say less than .001, you fail the test. Alternatively, you could set up equal-sized bins and do a ChiSquare test with expected bin counts all equal to the sample size / number of bins (this is what RealDistributionAbstractTest#testSampling does). > Wrong tolerance in some unit tests of "RandomGeneratorAbstractTest" > ------------------------------------------------------------------- > > Key: MATH-1313 > URL: https://issues.apache.org/jira/browse/MATH-1313 > Project: Commons Math > Issue Type: Bug > Reporter: Gilles > Assignee: Gilles > Priority: Minor > Labels: unit-test > Fix For: 4.0 > > > I doubt that the mean check in the unit test below is ever going to trigger > an assertion failure... > {noformat} > @Test > public void testDoubleDirect() { > SummaryStatistics sample = new SummaryStatistics(); > final int N = 10000; > for (int i = 0; i < N; ++i) { > sample.addValue(generator.nextDouble()); > } > Assert.assertEquals("Note: This test will fail randomly about 1 in > 100 times.", > 0.5, sample.getMean(), FastMath.sqrt(N/12.0) * 2.576); > Assert.assertEquals(1.0 / (2.0 * FastMath.sqrt(3.0)), > sample.getStandardDeviation(), 0.01); > } > {noformat} > And similar in "testFloatDirect()". > I propose the following replacement: > {noformat} > @Test > public void testDoubleDirect() { > SummaryStatistics sample = new SummaryStatistics(); > final int N = 100000; > for (int i = 0; i < N; ++i) { > sample.addValue(generator.nextDouble()); > } > assertUniformInUnitInterval(sample, 0.99); > } > {noformat} > where "assertUniformInUnitInterval" is defined as: > {noformat} > /** > > > * Check that the sample follows a uniform distribution on the {@code [0, > 1)} interval. > > * > > > * @param sample Data summary. > > > * @param confidenceIntervalLevel Confidence level. Must be in {@code (0, > 1)} interval. > > */ > private void assertUniformInUnitInterval(SummaryStatistics sample, > double confidenceIntervalLevel) { > final int numSamples = (int) sample.getN(); > final double mean = sample.getMean(); > final double stddev = sample.getStandardDeviation() / > FastMath.sqrt(numSamples); > final TDistribution t = new TDistribution(numSamples - 1); > final double criticalValue = t.inverseCumulativeProbability(1 - 0.5 * > (1 - confidenceIntervalLevel)); > final double tol = stddev * criticalValue; > Assert.assertEquals("mean=" + mean + " tol=" + tol + " (note: This > test will fail randomly about " + > (100 * (1 - confidenceIntervalLevel)) + " in 100 > times).", > 0.5, mean, tol); > Assert.assertEquals(FastMath.sqrt(1d / 12), > sample.getStandardDeviation(), 0.01); > } > {noformat} > Please correct if this new test is not what was intended. -- This message was sent by Atlassian JIRA (v6.3.4#6332)