[ https://issues.apache.org/jira/browse/CRUNCH-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629286#comment-14629286 ]
Gabriel Reid commented on CRUNCH-542: ------------------------------------- FWIW, I think that just using the seeded version of the test is fine (that's what is done in o.a.c.lib.SampleTest). Checking that it's within 5 standard deviations isn't that far away from not checking it at all isn't it? Another option might be to do three un-seeded calls to sample and then calculate the average. > Wider tolerance for flaky scrunch PCollectionTest > ------------------------------------------------- > > Key: CRUNCH-542 > URL: https://issues.apache.org/jira/browse/CRUNCH-542 > Project: Crunch > Issue Type: Improvement > Components: Scrunch > Affects Versions: 0.10.0, 0.11.0, 0.12.0 > Reporter: Josh Wills > Priority: Minor > Fix For: 0.13.0 > > Attachments: CRUNCH-542.patch > > > One of the Scrunch tests uses an unseeded version of the sample() function > that verifies that it works correctly by ensuring that an actual sampling of > elements is within ~ 3 standard deviations of the expected value. Given this, > we expect the test to fail about once every 370 times it is run, or once a > year if the tests were run every day. > My issue is that we test about a dozen versions of Crunch automatically in > Jenkins every day, and so I'm having this test fail on at least some version > about once every month. I'd like to bump the control limit up to a little > over 5 standard deviations so that the test fails around once every > millennium and/or get rid of the test entirely and only rely on the seeded > versions of the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)