[ 
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4609:
---------------------------------------

    Attachment: LUCENE-4609.patch

Here's another attempt (totally prototype / not committable) at using 
PackedInts to hold the ords ...

It's hacked up: it visits all byte[] from DocValues in the index and converts 
to in-RAM PackedInts arrays, and then does all facet counting from those arrays.

But, the performance is sort of 'meh':

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                 MedTerm      109.40      (1.5%)      102.06      (1.5%)   
-6.7% (  -9% -   -3%)
              AndHighLow      374.95      (3.0%)      361.19      (2.6%)   
-3.7% (  -8% -    1%)
              AndHighMed      172.57      (1.5%)      169.35      (1.1%)   
-1.9% (  -4% -    0%)
                 Prefix3      177.54      (6.2%)      174.26      (8.0%)   
-1.8% ( -15% -   13%)
                  IntNRQ      116.07      (7.5%)      113.97      (9.3%)   
-1.8% ( -17% -   16%)
                  Fuzzy2       86.19      (2.4%)       85.16      (2.8%)   
-1.2% (  -6% -    4%)
             AndHighHigh       46.76      (1.4%)       46.36      (1.1%)   
-0.8% (  -3% -    1%)
                 LowTerm      146.56      (1.8%)      145.58      (1.4%)   
-0.7% (  -3% -    2%)
                HighTerm       26.35      (2.0%)       26.20      (2.1%)   
-0.6% (  -4% -    3%)
             MedSpanNear       64.98      (2.3%)       64.62      (2.8%)   
-0.5% (  -5% -    4%)
         LowSloppyPhrase       67.07      (2.3%)       66.80      (3.6%)   
-0.4% (  -6% -    5%)
               OrHighMed       25.18      (1.6%)       25.10      (2.1%)   
-0.3% (  -3% -    3%)
                Wildcard      256.33      (3.1%)      255.56      (3.5%)   
-0.3% (  -6% -    6%)
                PKLookup      305.42      (2.3%)      304.72      (2.1%)   
-0.2% (  -4% -    4%)
               OrHighLow       24.59      (1.3%)       24.54      (2.2%)   
-0.2% (  -3% -    3%)
                  Fuzzy1       81.38      (3.0%)       81.60      (2.7%)    
0.3% (  -5% -    6%)
                 Respell      141.17      (3.8%)      141.87      (3.9%)    
0.5% (  -6% -    8%)
             LowSpanNear       38.34      (3.2%)       38.78      (3.0%)    
1.1% (  -4% -    7%)
         MedSloppyPhrase       63.80      (2.1%)       64.53      (3.5%)    
1.1% (  -4% -    6%)
            HighSpanNear       10.20      (2.8%)       10.32      (3.1%)    
1.2% (  -4% -    7%)
               MedPhrase      103.16      (4.5%)      104.72      (2.1%)    
1.5% (  -4% -    8%)
              OrHighHigh       17.81      (1.5%)       18.18      (2.7%)    
2.1% (  -2% -    6%)
               LowPhrase       58.77      (5.5%)       60.49      (3.0%)    
2.9% (  -5% -   12%)
              HighPhrase       38.68     (10.0%)       40.46      (5.6%)    
4.6% ( -10% -   22%)
        HighSloppyPhrase        2.97      (7.9%)        3.22     (12.6%)    
8.3% ( -11% -   31%)

{noformat}

Maybe if I used the bulk read PackedInts APIs instead it would be better...
                
> Write a PackedIntsEncoder/Decoder for facets
> --------------------------------------------
>
>                 Key: LUCENE-4609
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4609
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Shai Erera
>            Priority: Minor
>         Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the 
> category ordinals. We have several such encoders, including VInt (default), 
> and block encoders.
> It would be interesting to implement and benchmark a 
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives 
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and 
> the max value you can see and (2) one that decides for each doc on the 
> optimal bitsPerValue, writes it as a header in the byte[] or something.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to