[ 
https://issues.apache.org/jira/browse/LUCENE-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123710#comment-13123710
 ] 

Doron Cohen edited comment on LUCENE-3501 at 10/9/11 3:50 PM:
--------------------------------------------------------------

Before applying this patch should do:
{noformat}
svn mv modules/facet/src/java/org/apache/lucene/facet/util/RandomSample.java 
modules/facet/src/java/org/apache/lucene/facet/search/sampling/RepeatableSampler.java
{noformat}

I looked at this, and also discussed with Gilad, and it seems that:

* The test is broken as it claims to do N trials in case of failure but it does 
not, because its try/catch does not catch AssertionError, and so only one trial 
is attempted. (Few trials make sense because with sampling, there is always a 
possibility that the selected sample set of docs would not contain the 
"correct" best facets even with a high over sampling ratio (over sampling means 
that for the selected set of docs more best facets are collected).

* Even after fixing the test to actually try more than once, it still fails, 
because there is no randomness in RandomSample...  surprising but true.

In this patch:
* Sampler made an abstract class.
* RandomSample renamed to RepeatableSampler which extends Sampler.
* RandomSampler was added - it too extends Sampler - this is a simple *random* 
implementation, which is now the default (used by default in 
SamplingWrapperAccumulator).
* The test randomly selects between the two sampler implementations. If you 
want to see the behavior that created the bug, remove that latter randomness by 
setting to false the variable *useRandomSampler* of 
*BaseSampleTestTopK.testCountUsingSamping()*.

I think this is ready to commit. 
Wasn't sure though, where should the Changes entry go?
                
      was (Author: doronc):
    Before applying this patch should do:
{noformat}
svn mv modules/facet/src/java/org/apache/lucene/facet/util/RandomSample.java 
modules/facet/src/java/org/apache/lucene/facet/search/sampling/RepeatableSampler.java
{noformat}

I looked at this, and also discussed with Gilad, and it seems that:

* The test is broken as it claims to do N trials in case of failure but it does 
not, because its try/catch does not catch AssertionError, and so only one trial 
is attempted. (Few trials make sense because with sampling, there is always a 
possibility that the selected sample set of docs would not contain the 
"correct" best facets even with a high over sampling ratio (over sampling means 
that for the selected set of docs more best facets are collected).

* Even after fixing the test to actually try more than once, it still fails, 
because there is no randomness in RandomSample...  surprising but true.

In this patch:
* Sampler made an abstract class.
* RandomSample renamed to RepeatableSampler which extends RandomSampler.
* RandomSampler was added - it too extends Sampler - this is a simple *random* 
implementation, which is now the default (used by default in 
SamplingWrapperAccumulator).
* The test randomly selects between the two sampler implementations. If you 
want to see the behavior that created the bug, remove that latter randomness by 
setting to false the variable *useRandomSampler* of 
*BaseSampleTestTopK.testCountUsingSamping()*.

I think this is ready to commit. 
Wasn't sure though, where should the Changes entry go?
                  
> random sampler is not random (and so facet SamplingWrapperTest occasionally 
> fails)
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-3501
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3501
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/facet
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>            Priority: Minor
>         Attachments: LUCENE-3501.patch
>
>
> RandomSample is not random at all:
> It does not even import java.util.Random, and its behavior is deterministic.
> in addition, the test testCountUsingSamping() never retries as it was 
> supposed to (for taking care of the hoped-for randomness).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to