[ 
https://issues.apache.org/jira/browse/LUCENE-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilad Barkai updated LUCENE-5015:
---------------------------------

    Attachment: LUCENE-5015.patch

Added a parameter to {{SamplingParams}} named {{fixToExact}} which defaults to 
{{false}}. 
I think it is probable that one who uses sampling may not be interested in 
exact results.

In the proposed approach, the {{Sampler}} would create either the old, slow, 
and accurate {{TakmiSampleFixer}} if {{SamplingParams.shouldFixToExact()}} is 
{{true}}. Otherwise the much (much!} faster {{AmortizedSampleFixer}} would be 
used, when it only take under account the sampling ratio, assuming the sampled 
set represent the whole set with 100% accuracy.

With these changes, the code above should already use the amortized fixer, as 
the default is now it.
If the old fixer is to be used - for comparison - the code could look as 
follows:

{code}
final FacetSearchParams facetSearchParams = new FacetSearchParams( 
facetRequests );

FacetsCollector facetsCollector;

if ( isSampled )
{
        // Create SamplingParams which denotes fixing to exact
        SamplingParams samplingParams = new SamplingParams();
        samplingParams.setFixToExact(true);

        // Use the custom sampling params while creating the RandomSampler
        facetsCollector =
                FacetsCollector.create( new SamplingAccumulator( new 
RandomSampler(samplingParams, new Random(someSeed)), facetSearchParams, 
searcher.getIndexReader(), taxo ) );
}
else
{
        facetsCollector = FacetsCollector.create( FacetsAccumulator.create( 
facetSearchParams, searcher.getIndexReader(), taxo ) );
}
{code}

The sampling tests still use the "exact" fixer, as it is not easy asserting 
against amortized results. I'm still looking into creating a complete faceted 
search flow test with the amortized-fixer.
                
> Unexpected performance difference between SamplingAccumulator and 
> StandardFacetAccumulator
> ------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-5015
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5015
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/facet
>    Affects Versions: 4.3
>            Reporter: Rob Audenaerde
>            Priority: Minor
>         Attachments: LUCENE-5015.patch
>
>
> I have an unexpected performance difference between the SamplingAccumulator 
> and the StandardFacetAccumulator. 
> The case is an index with about 5M documents and each document containing 
> about 10 fields. I created a facet on each of those fields. When searching to 
> retrieve facet-counts (using 1 CountFacetRequest), the SamplingAccumulator is 
> about twice as fast as the StandardFacetAccumulator. This is expected and a 
> nice speed-up. 
> However, when I use more CountFacetRequests to retrieve facet-counts for more 
> than one field, the speeds of the SampingAccumulator decreases, to the point 
> where the StandardFacetAccumulator is faster. 
> {noformat} 
> FacetRequests  Sampling    Standard
>  1               391 ms     1100 ms
>  2               531 ms     1095 ms 
>  3               948 ms     1108 ms
>  4              1400 ms     1110 ms
>  5              1901 ms     1102 ms
> {noformat} 
> Is this behaviour normal? I did not expect it, as the SamplingAccumulator 
> needs to do less work? 
> Some code to show what I do:
> {code}
>       searcher.search( facetsQuery, facetsCollector );
>       final List<FacetResult> collectedFacets = 
> facetsCollector.getFacetResults();
> {code}
> {code}
> final FacetSearchParams facetSearchParams = new FacetSearchParams( 
> facetRequests );
> FacetsCollector facetsCollector;
> if ( isSampled )
> {
>       facetsCollector =
>               FacetsCollector.create( new SamplingAccumulator( new 
> RandomSampler(), facetSearchParams, searcher.getIndexReader(), taxo ) );
> }
> else
> {
>       facetsCollector = FacetsCollector.create( FacetsAccumulator.create( 
> facetSearchParams, searcher.getIndexReader(), taxo ) );
> {code}
>                       

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to