[ 
https://issues.apache.org/jira/browse/LUCENE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4764:
---------------------------------------

    Attachment: LUCENE-4764.patch

Initial dirty patch (lots of nocommits still):

I added a FacetDocValuesFormat, which goes back to the
more-RAM-consuming-but-faster-for-facets 4.0 format, and also hacked
the FastCountingFacetsAggregator to directly decode from the full
byte[], saving overhead of method-call and filling a BytesRef.  It
gets faster results than default (Lucene42) DVFormat:

This is wikibig all 6.6M, 7 facet dims:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                 LowTerm      110.44      (2.0%)      104.86      (1.0%)   
-5.1% (  -7% -   -2%)
                  Fuzzy1       46.50      (2.6%)       44.83      (1.3%)   
-3.6% (  -7% -    0%)
             MedSpanNear       28.61      (2.9%)       27.91      (1.8%)   
-2.5% (  -6% -    2%)
                 Respell       45.56      (4.0%)       44.71      (3.1%)   
-1.9% (  -8% -    5%)
                  Fuzzy2       52.44      (3.6%)       51.69      (2.2%)   
-1.4% (  -6% -    4%)
               LowPhrase       21.30      (6.3%)       21.01      (6.0%)   
-1.4% ( -12% -   11%)
             LowSpanNear        8.37      (2.4%)        8.26      (3.3%)   
-1.3% (  -6% -    4%)
         MedSloppyPhrase       25.88      (2.4%)       25.73      (2.3%)   
-0.6% (  -5% -    4%)
              AndHighMed      105.02      (1.4%)      105.78      (1.0%)    
0.7% (  -1% -    3%)
         LowSloppyPhrase       20.32      (3.2%)       20.55      (3.5%)    
1.1% (  -5% -    8%)
            HighSpanNear        3.51      (2.4%)        3.56      (1.7%)    
1.2% (  -2% -    5%)
              HighPhrase       17.32     (10.1%)       17.56     (10.2%)    
1.4% ( -17% -   24%)
              AndHighLow      575.37      (3.9%)      583.69      (3.7%)    
1.4% (  -5% -    9%)
        HighSloppyPhrase        0.92      (6.2%)        0.95      (6.8%)    
2.4% (  -9% -   16%)
             AndHighHigh       23.25      (1.4%)       24.54      (0.9%)    
5.5% (   3% -    7%)
               MedPhrase      110.00      (5.3%)      117.78      (6.1%)    
7.1% (  -4% -   19%)
                Wildcard       27.31      (2.1%)       32.28      (1.6%)   
18.2% (  14% -   22%)
                 MedTerm       46.99      (2.7%)       57.33      (1.8%)   
22.0% (  17% -   27%)
               OrHighMed       16.38      (3.6%)       21.44      (3.2%)   
30.9% (  23% -   39%)
              OrHighHigh        8.63      (3.7%)       11.33      (3.6%)   
31.3% (  23% -   39%)
               OrHighLow       16.88      (3.5%)       22.21      (3.3%)   
31.6% (  23% -   39%)
                 Prefix3       12.91      (2.9%)       17.29      (2.0%)   
33.9% (  28% -   39%)
                HighTerm       18.99      (2.8%)       25.99      (2.5%)   
36.9% (  30% -   43%)
                  IntNRQ        3.54      (3.2%)        4.96      (2.2%)   
40.0% (  33% -   46%)
{noformat}

But it's also more Disk/RAM-consuming: trunk facet DVs take 61.2 MB
while the patch takes 80.3 MB (31% more).

                
> Faster but more RAM/Disk consuming DocValuesFormat for facets
> -------------------------------------------------------------
>
>                 Key: LUCENE-4764
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4764
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.2, 5.0
>
>         Attachments: LUCENE-4764.patch
>
>
> The new default DV format for binary fields has much more
> RAM-efficient encoding of the address for each document ... but it's
> also a bit slower at decode time, which affects facets because we
> decode for every collected docID.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to