[ https://issues.apache.org/jira/browse/LUCENE-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123710#comment-13123710 ]
Doron Cohen edited comment on LUCENE-3501 at 10/9/11 3:50 PM: -------------------------------------------------------------- Before applying this patch should do: {noformat} svn mv modules/facet/src/java/org/apache/lucene/facet/util/RandomSample.java modules/facet/src/java/org/apache/lucene/facet/search/sampling/RepeatableSampler.java {noformat} I looked at this, and also discussed with Gilad, and it seems that: * The test is broken as it claims to do N trials in case of failure but it does not, because its try/catch does not catch AssertionError, and so only one trial is attempted. (Few trials make sense because with sampling, there is always a possibility that the selected sample set of docs would not contain the "correct" best facets even with a high over sampling ratio (over sampling means that for the selected set of docs more best facets are collected). * Even after fixing the test to actually try more than once, it still fails, because there is no randomness in RandomSample... surprising but true. In this patch: * Sampler made an abstract class. * RandomSample renamed to RepeatableSampler which extends Sampler. * RandomSampler was added - it too extends Sampler - this is a simple *random* implementation, which is now the default (used by default in SamplingWrapperAccumulator). * The test randomly selects between the two sampler implementations. If you want to see the behavior that created the bug, remove that latter randomness by setting to false the variable *useRandomSampler* of *BaseSampleTestTopK.testCountUsingSamping()*. I think this is ready to commit. Wasn't sure though, where should the Changes entry go? was (Author: doronc): Before applying this patch should do: {noformat} svn mv modules/facet/src/java/org/apache/lucene/facet/util/RandomSample.java modules/facet/src/java/org/apache/lucene/facet/search/sampling/RepeatableSampler.java {noformat} I looked at this, and also discussed with Gilad, and it seems that: * The test is broken as it claims to do N trials in case of failure but it does not, because its try/catch does not catch AssertionError, and so only one trial is attempted. (Few trials make sense because with sampling, there is always a possibility that the selected sample set of docs would not contain the "correct" best facets even with a high over sampling ratio (over sampling means that for the selected set of docs more best facets are collected). * Even after fixing the test to actually try more than once, it still fails, because there is no randomness in RandomSample... surprising but true. In this patch: * Sampler made an abstract class. * RandomSample renamed to RepeatableSampler which extends RandomSampler. * RandomSampler was added - it too extends Sampler - this is a simple *random* implementation, which is now the default (used by default in SamplingWrapperAccumulator). * The test randomly selects between the two sampler implementations. If you want to see the behavior that created the bug, remove that latter randomness by setting to false the variable *useRandomSampler* of *BaseSampleTestTopK.testCountUsingSamping()*. I think this is ready to commit. Wasn't sure though, where should the Changes entry go? > random sampler is not random (and so facet SamplingWrapperTest occasionally > fails) > ---------------------------------------------------------------------------------- > > Key: LUCENE-3501 > URL: https://issues.apache.org/jira/browse/LUCENE-3501 > Project: Lucene - Java > Issue Type: Bug > Components: modules/facet > Reporter: Doron Cohen > Assignee: Doron Cohen > Priority: Minor > Attachments: LUCENE-3501.patch > > > RandomSample is not random at all: > It does not even import java.util.Random, and its behavior is deterministic. > in addition, the test testCountUsingSamping() never retries as it was > supposed to (for taking care of the hoped-for randomness). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org