[
https://issues.apache.org/jira/browse/LUCENE-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573878#comment-13573878
]
Michael McCandless commented on LUCENE-4757:
--------------------------------------------
Perf results on full en wiki index for last patch:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
IntNRQ 4.14 (2.7%) 3.86 (2.6%)
-6.9% ( -11% - -1%)
HighTerm 22.41 (2.8%) 20.95 (3.3%)
-6.5% ( -12% - 0%)
MedTerm 53.33 (2.4%) 50.23 (2.8%)
-5.8% ( -10% - 0%)
Prefix3 14.82 (2.7%) 13.97 (2.2%)
-5.8% ( -10% - 0%)
OrHighLow 19.22 (2.9%) 18.16 (2.8%)
-5.5% ( -10% - 0%)
OrHighMed 18.62 (2.9%) 17.62 (2.7%)
-5.4% ( -10% - 0%)
OrHighHigh 9.80 (3.1%) 9.28 (2.9%)
-5.3% ( -10% - 0%)
Wildcard 29.91 (1.7%) 28.74 (1.6%)
-3.9% ( -7% - 0%)
LowTerm 111.45 (2.0%) 109.14 (1.3%)
-2.1% ( -5% - 1%)
AndHighHigh 23.73 (1.1%) 23.24 (1.1%)
-2.0% ( -4% - 0%)
MedPhrase 115.26 (5.8%) 113.02 (5.7%)
-1.9% ( -12% - 10%)
Fuzzy1 47.08 (2.2%) 46.66 (2.3%)
-0.9% ( -5% - 3%)
HighPhrase 17.55 (10.3%) 17.40 (10.3%)
-0.9% ( -19% - 21%)
AndHighLow 601.66 (2.4%) 597.32 (1.7%)
-0.7% ( -4% - 3%)
HighSloppyPhrase 0.94 (7.1%) 0.93 (6.3%)
-0.6% ( -13% - 13%)
AndHighMed 105.65 (1.4%) 105.15 (1.0%)
-0.5% ( -2% - 1%)
LowPhrase 21.21 (6.1%) 21.12 (6.1%)
-0.4% ( -11% - 12%)
Respell 46.16 (3.9%) 45.96 (4.4%)
-0.4% ( -8% - 8%)
Fuzzy2 53.16 (3.1%) 52.95 (3.2%)
-0.4% ( -6% - 6%)
MedSloppyPhrase 26.11 (2.2%) 26.02 (1.9%)
-0.3% ( -4% - 3%)
LowSloppyPhrase 20.53 (2.8%) 20.47 (2.4%)
-0.3% ( -5% - 5%)
HighSpanNear 3.53 (1.9%) 3.53 (1.8%)
-0.0% ( -3% - 3%)
MedSpanNear 28.56 (1.9%) 28.56 (2.1%)
-0.0% ( -3% - 4%)
LowSpanNear 8.31 (2.6%) 8.34 (2.8%)
0.3% ( -5% - 5%)
{noformat}
Net/net no faster (maybe a bit slower!) so I think we should just to back to
the previous patch?
> Cleanup FacetsAccumulator API path
> ----------------------------------
>
> Key: LUCENE-4757
> URL: https://issues.apache.org/jira/browse/LUCENE-4757
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Shai Erera
> Assignee: Shai Erera
> Attachments: LUCENE-4757.patch, LUCENE-4757.patch, LUCENE-4757.patch,
> LUCENE-4757.patch
>
>
> FacetsAccumulator and FacetRequest expose too many things to users, even when
> they are not needed, e.g. complements and partitions. Also, Aggregator is
> created per-FacetRequest, while in fact applied per category list. This is
> confusing, because if you want to do two aggregations, e.g. count and
> sum-score, you need to separate the two dimensions into two different
> category lists at indexing time.
> It's not so easy to refactor everything in one go, since there's a lot of
> code involved. So in this issue I will:
> * Remove complements from FacetRequest. It is only relevant to
> CountFacetRequest anyway. In the future, it should be a special Accumulator.
> * Make FacetsAccumulator concrete class, and StandardFacetsAccumulator extend
> it and handles all the stuff that's relevant to sampling, complements and
> partitions. Gradually, these things will be migrated to the new API, and
> hopefully StandardFacetsAccumulator will go away.
> * Aggregator is per-document. I could not break its API b/c some features
> (e.g. complement) depend on it. So rather I created a new FacetsAggregator,
> with a bulk, per-segment, API. So far migrated Counting and SumScore to that
> API.
> ** In the new API, you need to override FacetsAccumulator to define an
> Aggregator for use, the default is CountingFacetsAggregator.
> * Started to refactor FacetResultsHandler, which its API was guided by the
> use of partitions. I added a simple {{compute(FacetArrays)}} to it, which by
> default delegates to the nasty API, but overridden by specific classes. This
> will get cleaned further along too.
> * FacetRequest has a .getValueOf() which resolves an ordinal to its value
> (i.e. which of the two arrays to use). I added FacetRequest.FacetArraysSource
> and specialize when they are INT or FLOAT, creating a special
> FacetResultsHandler which does not go back to FR.getValueOf for every
> ordinal. I think that we can migrate other FacetResultsHandlers to behave
> like that ... at the expense of code duplication.
> ** I also added a TODO to get rid of getValueOf entirely .. will be done
> separately.
> * Got rid of CountingFacetsCollector and StandardFacetsCollector in favor of
> a single FacetsCollector which collects matching documents, and optionally
> scores, per-segment. I wrote a migration class from these per-segment
> MatchingDocs to ScoredDocIDs (which is global), so that the rest of the code
> works, but the new code works w/ the optimized per-segment API. I hope
> performance is still roughly the same w/ these changes too.
> There will be follow-on issues to migrate more features to the new API, and
> more cleanups ...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]