[ https://issues.apache.org/jira/browse/LUCENE-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gilad Barkai updated LUCENE-5015: --------------------------------- Attachment: LUCENE-5015.patch Added a parameter to {{SamplingParams}} named {{fixToExact}} which defaults to {{false}}. I think it is probable that one who uses sampling may not be interested in exact results. In the proposed approach, the {{Sampler}} would create either the old, slow, and accurate {{TakmiSampleFixer}} if {{SamplingParams.shouldFixToExact()}} is {{true}}. Otherwise the much (much!} faster {{AmortizedSampleFixer}} would be used, when it only take under account the sampling ratio, assuming the sampled set represent the whole set with 100% accuracy. With these changes, the code above should already use the amortized fixer, as the default is now it. If the old fixer is to be used - for comparison - the code could look as follows: {code} final FacetSearchParams facetSearchParams = new FacetSearchParams( facetRequests ); FacetsCollector facetsCollector; if ( isSampled ) { // Create SamplingParams which denotes fixing to exact SamplingParams samplingParams = new SamplingParams(); samplingParams.setFixToExact(true); // Use the custom sampling params while creating the RandomSampler facetsCollector = FacetsCollector.create( new SamplingAccumulator( new RandomSampler(samplingParams, new Random(someSeed)), facetSearchParams, searcher.getIndexReader(), taxo ) ); } else { facetsCollector = FacetsCollector.create( FacetsAccumulator.create( facetSearchParams, searcher.getIndexReader(), taxo ) ); } {code} The sampling tests still use the "exact" fixer, as it is not easy asserting against amortized results. I'm still looking into creating a complete faceted search flow test with the amortized-fixer. > Unexpected performance difference between SamplingAccumulator and > StandardFacetAccumulator > ------------------------------------------------------------------------------------------ > > Key: LUCENE-5015 > URL: https://issues.apache.org/jira/browse/LUCENE-5015 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet > Affects Versions: 4.3 > Reporter: Rob Audenaerde > Priority: Minor > Attachments: LUCENE-5015.patch > > > I have an unexpected performance difference between the SamplingAccumulator > and the StandardFacetAccumulator. > The case is an index with about 5M documents and each document containing > about 10 fields. I created a facet on each of those fields. When searching to > retrieve facet-counts (using 1 CountFacetRequest), the SamplingAccumulator is > about twice as fast as the StandardFacetAccumulator. This is expected and a > nice speed-up. > However, when I use more CountFacetRequests to retrieve facet-counts for more > than one field, the speeds of the SampingAccumulator decreases, to the point > where the StandardFacetAccumulator is faster. > {noformat} > FacetRequests Sampling Standard > 1 391 ms 1100 ms > 2 531 ms 1095 ms > 3 948 ms 1108 ms > 4 1400 ms 1110 ms > 5 1901 ms 1102 ms > {noformat} > Is this behaviour normal? I did not expect it, as the SamplingAccumulator > needs to do less work? > Some code to show what I do: > {code} > searcher.search( facetsQuery, facetsCollector ); > final List<FacetResult> collectedFacets = > facetsCollector.getFacetResults(); > {code} > {code} > final FacetSearchParams facetSearchParams = new FacetSearchParams( > facetRequests ); > FacetsCollector facetsCollector; > if ( isSampled ) > { > facetsCollector = > FacetsCollector.create( new SamplingAccumulator( new > RandomSampler(), facetSearchParams, searcher.getIndexReader(), taxo ) ); > } > else > { > facetsCollector = FacetsCollector.create( FacetsAccumulator.create( > facetSearchParams, searcher.getIndexReader(), taxo ) ); > {code} > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org