[jira] [Commented] (LUCENE-4609) Write a PackedIntsEncoder/Decoder for facets

Michael McCandless (JIRA) Tue, 22 Jan 2013 13:34:14 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560048#comment-13560048
 ]


Michael McCandless commented on LUCENE-4609:
--------------------------------------------

The above results were 1M index; here's the full wikipedia en (6.6M docs) 
results:
{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
            HighSpanNear        2.91      (2.1%)        2.90      (2.4%)   
-0.6% (  -5% -    4%)
                 Prefix3       46.35      (4.0%)       46.07      (3.9%)   
-0.6% (  -8% -    7%)
                PKLookup      240.11      (1.4%)      238.95      (1.9%)   
-0.5% (  -3% -    2%)
                Wildcard       73.79      (2.2%)       73.48      (2.3%)   
-0.4% (  -4% -    4%)
                  IntNRQ       18.05      (6.1%)       18.01      (5.9%)   
-0.2% ( -11% -   12%)
                 Respell       96.78      (3.1%)       98.09      (3.3%)    
1.3% (  -4% -    7%)
         LowSloppyPhrase       17.63      (4.4%)       17.91      (3.8%)    
1.6% (  -6% -   10%)
              AndHighLow      108.80      (2.8%)      110.58      (4.2%)    
1.6% (  -5% -    8%)
             LowSpanNear        7.53      (4.8%)        7.67      (5.6%)    
1.8% (  -8% -   12%)
        HighSloppyPhrase        0.87     (10.1%)        0.90      (9.6%)    
3.2% ( -14% -   25%)
                  Fuzzy2       42.22      (2.5%)       43.90      (2.7%)    
4.0% (  -1% -    9%)
              HighPhrase       15.32      (7.5%)       15.93      (5.4%)    
4.0% (  -8% -   18%)
               LowPhrase       17.09      (4.3%)       18.10      (2.9%)    
5.9% (  -1% -   13%)
              AndHighMed       52.60      (1.4%)       55.90      (2.1%)    
6.3% (   2% -    9%)
             MedSpanNear       20.09      (2.0%)       21.44      (1.8%)    
6.7% (   2% -   10%)
         MedSloppyPhrase       18.69      (3.0%)       20.00      (2.7%)    
7.0% (   1% -   13%)
                  Fuzzy1       33.68      (2.0%)       37.26      (2.2%)   
10.6% (   6% -   15%)
               MedPhrase       57.00      (2.9%)       63.56      (3.3%)   
11.5% (   5% -   18%)
                 MedTerm       19.22      (1.2%)       21.70      (1.1%)   
12.9% (  10% -   15%)
                 LowTerm       41.98      (1.2%)       48.26      (1.8%)   
15.0% (  11% -   18%)
             AndHighHigh       12.09      (1.0%)       13.98      (1.2%)   
15.7% (  13% -   18%)
                HighTerm        7.11      (2.1%)        9.11      (2.0%)   
28.1% (  23% -   32%)
               OrHighMed        6.67      (2.4%)        8.55      (2.1%)   
28.2% (  23% -   33%)
               OrHighLow        6.76      (2.1%)        8.70      (2.3%)   
28.6% (  23% -   33%)
              OrHighHigh        3.84      (2.5%)        5.33      (2.7%)   
38.7% (  32% -   45%)
{noformat}

On-disk size of _dv* is 464768 KB and in memory int[] is 669428 KB (44% more).

Next I'll try NO_PARENTS ord policy...
                
> Write a PackedIntsEncoder/Decoder for facets
> --------------------------------------------
>
>                 Key: LUCENE-4609
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4609
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Shai Erera
>            Priority: Minor
>         Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch, 
> LUCENE-4609.patch
>
>
> Today the facets API lets you write IntEncoder/Decoder to encode/decode the 
> category ordinals. We have several such encoders, including VInt (default), 
> and block encoders.
> It would be interesting to implement and benchmark a 
> PackedIntsEncoder/Decoder, with potentially two variants: (1) receives 
> bitsPerValue up front, when you e.g. know that you have a small taxonomy and 
> the max value you can see and (2) one that decides for each doc on the 
> optimal bitsPerValue, writes it as a header in the byte[] or something.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4609) Write a PackedIntsEncoder/Decoder for facets

Reply via email to