[jira] [Commented] (LUCENE-4757) Cleanup FacetsAccumulator API path

Michael McCandless (JIRA) Thu, 07 Feb 2013 12:17:14 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573878#comment-13573878
 ]


Michael McCandless commented on LUCENE-4757:
--------------------------------------------

Perf results on full en wiki index for last patch:
{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                  IntNRQ        4.14      (2.7%)        3.86      (2.6%)   
-6.9% ( -11% -   -1%)
                HighTerm       22.41      (2.8%)       20.95      (3.3%)   
-6.5% ( -12% -    0%)
                 MedTerm       53.33      (2.4%)       50.23      (2.8%)   
-5.8% ( -10% -    0%)
                 Prefix3       14.82      (2.7%)       13.97      (2.2%)   
-5.8% ( -10% -    0%)
               OrHighLow       19.22      (2.9%)       18.16      (2.8%)   
-5.5% ( -10% -    0%)
               OrHighMed       18.62      (2.9%)       17.62      (2.7%)   
-5.4% ( -10% -    0%)
              OrHighHigh        9.80      (3.1%)        9.28      (2.9%)   
-5.3% ( -10% -    0%)
                Wildcard       29.91      (1.7%)       28.74      (1.6%)   
-3.9% (  -7% -    0%)
                 LowTerm      111.45      (2.0%)      109.14      (1.3%)   
-2.1% (  -5% -    1%)
             AndHighHigh       23.73      (1.1%)       23.24      (1.1%)   
-2.0% (  -4% -    0%)
               MedPhrase      115.26      (5.8%)      113.02      (5.7%)   
-1.9% ( -12% -   10%)
                  Fuzzy1       47.08      (2.2%)       46.66      (2.3%)   
-0.9% (  -5% -    3%)
              HighPhrase       17.55     (10.3%)       17.40     (10.3%)   
-0.9% ( -19% -   21%)
              AndHighLow      601.66      (2.4%)      597.32      (1.7%)   
-0.7% (  -4% -    3%)
        HighSloppyPhrase        0.94      (7.1%)        0.93      (6.3%)   
-0.6% ( -13% -   13%)
              AndHighMed      105.65      (1.4%)      105.15      (1.0%)   
-0.5% (  -2% -    1%)
               LowPhrase       21.21      (6.1%)       21.12      (6.1%)   
-0.4% ( -11% -   12%)
                 Respell       46.16      (3.9%)       45.96      (4.4%)   
-0.4% (  -8% -    8%)
                  Fuzzy2       53.16      (3.1%)       52.95      (3.2%)   
-0.4% (  -6% -    6%)
         MedSloppyPhrase       26.11      (2.2%)       26.02      (1.9%)   
-0.3% (  -4% -    3%)
         LowSloppyPhrase       20.53      (2.8%)       20.47      (2.4%)   
-0.3% (  -5% -    5%)
            HighSpanNear        3.53      (1.9%)        3.53      (1.8%)   
-0.0% (  -3% -    3%)
             MedSpanNear       28.56      (1.9%)       28.56      (2.1%)   
-0.0% (  -3% -    4%)
             LowSpanNear        8.31      (2.6%)        8.34      (2.8%)    
0.3% (  -5% -    5%)
{noformat}

Net/net no faster (maybe a bit slower!) so I think we should just to back to 
the previous patch?
                
> Cleanup FacetsAccumulator API path
> ----------------------------------
>
>                 Key: LUCENE-4757
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4757
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-4757.patch, LUCENE-4757.patch, LUCENE-4757.patch, 
> LUCENE-4757.patch
>
>
> FacetsAccumulator and FacetRequest expose too many things to users, even when 
> they are not needed, e.g. complements and partitions. Also, Aggregator is 
> created per-FacetRequest, while in fact applied per category list. This is 
> confusing, because if you want to do two aggregations, e.g. count and 
> sum-score, you need to separate the two dimensions into two different 
> category lists at indexing time.
> It's not so easy to refactor everything in one go, since there's a lot of 
> code involved. So in this issue I will:
> * Remove complements from FacetRequest. It is only relevant to 
> CountFacetRequest anyway. In the future, it should be a special Accumulator.
> * Make FacetsAccumulator concrete class, and StandardFacetsAccumulator extend 
> it and handles all the stuff that's relevant to sampling, complements and 
> partitions. Gradually, these things will be migrated to the new API, and 
> hopefully StandardFacetsAccumulator will go away.
> * Aggregator is per-document. I could not break its API b/c some features 
> (e.g. complement) depend on it. So rather I created a new FacetsAggregator, 
> with a bulk, per-segment, API. So far migrated Counting and SumScore to that 
> API.
> ** In the new API, you need to override FacetsAccumulator to define an 
> Aggregator for use, the default is CountingFacetsAggregator.
> * Started to refactor FacetResultsHandler, which its API was guided by the 
> use of partitions. I added a simple {{compute(FacetArrays)}} to it, which by 
> default delegates to the nasty API, but overridden by specific classes. This 
> will get cleaned further along too.
> * FacetRequest has a .getValueOf() which resolves an ordinal to its value 
> (i.e. which of the two arrays to use). I added FacetRequest.FacetArraysSource 
> and specialize when they are INT or FLOAT, creating a special 
> FacetResultsHandler which does not go back to FR.getValueOf for every 
> ordinal. I think that we can migrate other FacetResultsHandlers to behave 
> like that ... at the expense of code duplication.
> ** I also added a TODO to get rid of getValueOf entirely .. will be done 
> separately.
> * Got rid of CountingFacetsCollector and StandardFacetsCollector in favor of 
> a single FacetsCollector which collects matching documents, and optionally 
> scores, per-segment. I wrote a migration class from these per-segment 
> MatchingDocs to ScoredDocIDs (which is global), so that the rest of the code 
> works, but the new code works w/ the optimized per-segment API. I hope 
> performance is still roughly the same w/ these changes too.
> There will be follow-on issues to migrate more features to the new API, and 
> more cleanups ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4757) Cleanup FacetsAccumulator API path

Reply via email to