[jira] [Commented] (LUCENE-4609) Write a PackedIntsEncoder/Decoder for facets

Michael McCandless (JIRA) Tue, 22 Jan 2013 14:46:14 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560129#comment-13560129
 ]


Michael McCandless commented on LUCENE-4609:
--------------------------------------------

Ugh!  My DV total bytes numbers were too high: luceneutil also indexes
title field as DV.  So ignore past byte sizes ... here's the [correct,
I hope!] byte sizes for the NO_PARENTS case, full 6.6M Wikipedia en
index: DV (index) 151208 KB, int[] (in RAM): 305889 KB.  And
NO_PARENTS perf (base = trunk, comp = int[] collector):

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                Wildcard       74.70      (3.3%)       74.32      (1.9%)   
-0.5% (  -5% -    4%)
                PKLookup      245.87      (1.8%)      244.80      (2.0%)   
-0.4% (  -4% -    3%)
              HighPhrase       15.68      (5.7%)       15.72      (6.4%)    
0.2% ( -11% -   12%)
                 Respell      111.09      (3.5%)      111.33      (3.7%)    
0.2% (  -6% -    7%)
              AndHighLow       97.90      (1.6%)       98.16      (1.4%)    
0.3% (  -2% -    3%)
             LowSpanNear        7.62      (3.8%)        7.67      (3.5%)    
0.7% (  -6% -    8%)
                 Prefix3       45.94      (5.6%)       46.34      (2.7%)    
0.9% (  -6% -    9%)
                  IntNRQ       18.04      (8.2%)       18.20      (4.6%)    
0.9% ( -11% -   14%)
         LowSloppyPhrase       17.77      (2.9%)       17.94      (4.8%)    
1.0% (  -6% -    8%)
                  Fuzzy2       41.36      (2.4%)       42.68      (2.3%)    
3.2% (  -1% -    8%)
               LowPhrase       16.94      (2.4%)       17.65      (3.5%)    
4.1% (  -1% -   10%)
            HighSpanNear        2.98      (2.8%)        3.14      (2.1%)    
5.3% (   0% -   10%)
              AndHighMed       49.18      (1.0%)       51.97      (0.7%)    
5.7% (   3% -    7%)
        HighSloppyPhrase        0.90      (6.7%)        0.97     (12.6%)    
6.8% ( -11% -   27%)
         MedSloppyPhrase       18.54      (1.8%)       19.91      (3.0%)    
7.4% (   2% -   12%)
             MedSpanNear       19.86      (1.6%)       21.36      (2.0%)    
7.5% (   3% -   11%)
               MedPhrase       55.57      (2.2%)       60.31      (2.3%)    
8.5% (   3% -   13%)
                  Fuzzy1       33.38      (1.4%)       37.19      (1.9%)   
11.4% (   8% -   14%)
             AndHighHigh       12.58      (1.2%)       14.66      (0.9%)   
16.6% (  14% -   18%)
                 LowTerm       40.41      (1.2%)       47.14      (1.4%)   
16.6% (  13% -   19%)
                 MedTerm       23.00      (1.4%)       27.14      (3.0%)   
18.0% (  13% -   22%)
               OrHighMed        7.50      (2.2%)       10.16      (2.3%)   
35.6% (  30% -   40%)
               OrHighLow        7.55      (2.0%)       10.30      (2.8%)   
36.3% (  30% -   41%)
                HighTerm        7.92      (1.9%)       10.98      (2.8%)   
38.6% (  33% -   44%)
              OrHighHigh        4.30      (2.7%)        6.39      (3.0%)   
48.6% (  41% -   55%)
{noformat}

                
> Write a PackedIntsEncoder/Decoder for facets
> --------------------------------------------
>
>                 Key: LUCENE-4609
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4609
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Shai Erera
>            Priority: Minor
>         Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch, 
> LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the 
> category ordinals. We have several such encoders, including VInt (default), 
> and block encoders.
> It would be interesting to implement and benchmark a 
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives 
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and 
> the max value you can see and (2) one that decides for each doc on the 
> optimal bitsPerValue, writes it as a header in the byte[] or something.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4609) Write a PackedIntsEncoder/Decoder for facets

Reply via email to