[ 
https://issues.apache.org/jira/browse/LUCENE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576094#comment-13576094
 ] 

Michael McCandless commented on LUCENE-4764:
--------------------------------------------

I re-tested trunk vs this new DV format, with all 9 dims on the full 6.6M 
wikibig index.  (The added 2 dims, username and categories, have many many 
unique values):

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
              HighPhrase       13.68      (8.1%)       13.64      (8.4%)   
-0.3% ( -15% -   17%)
               LowPhrase       15.05      (4.4%)       15.08      (4.4%)    
0.1% (  -8% -    9%)
             LowSpanNear        7.12      (2.5%)        7.17      (2.3%)    
0.6% (  -4% -    5%)
              AndHighLow       64.03      (1.3%)       64.55      (1.3%)    
0.8% (  -1% -    3%)
        HighSloppyPhrase        0.82      (5.7%)        0.83      (4.8%)    
1.1% (  -8% -   12%)
                 Respell       44.90      (4.0%)       45.43      (4.3%)    
1.2% (  -6% -    9%)
         LowSloppyPhrase       15.37      (2.1%)       15.57      (1.8%)    
1.3% (  -2% -    5%)
            HighSpanNear        2.91      (1.8%)        2.95      (1.9%)    
1.3% (  -2% -    5%)
                  Fuzzy2       28.55      (2.0%)       29.02      (2.1%)    
1.7% (  -2% -    5%)
         MedSloppyPhrase       16.56      (1.2%)       16.94      (1.2%)    
2.3% (   0% -    4%)
              AndHighMed       39.47      (0.8%)       40.40      (1.0%)    
2.4% (   0% -    4%)
                  Fuzzy1       24.08      (1.3%)       24.73      (1.4%)    
2.7% (   0% -    5%)
             MedSpanNear       17.70      (1.6%)       18.19      (1.6%)    
2.8% (   0% -    6%)
               MedPhrase       41.06      (2.2%)       42.46      (2.6%)    
3.4% (  -1% -    8%)
                 LowTerm       34.19      (0.9%)       35.69      (1.0%)    
4.4% (   2% -    6%)
             AndHighHigh       11.92      (1.2%)       12.50      (1.1%)    
4.9% (   2% -    7%)
                Wildcard       13.13      (1.8%)       14.43      (1.5%)    
9.9% (   6% -   13%)
               OrHighMed        7.09      (2.7%)        7.85      (1.6%)   
10.8% (   6% -   15%)
               OrHighLow        7.16      (2.3%)        7.93      (1.6%)   
10.8% (   6% -   15%)
                HighTerm        7.59      (2.3%)        8.47      (1.6%)   
11.5% (   7% -   15%)
                 MedTerm       20.14      (1.9%)       22.82      (1.1%)   
13.3% (  10% -   16%)
                 Prefix3        5.78      (2.2%)        6.56      (1.5%)   
13.4% (   9% -   17%)
              OrHighHigh        4.03      (2.3%)        4.65      (2.0%)   
15.4% (  10% -   20%)
                  IntNRQ        1.92      (2.2%)        2.45      (1.9%)   
27.5% (  22% -   32%)
{noformat}

145.3 MB for the new DV vs 129.0 MB for trunk = ~12.6% bigger.
                
> Faster but more RAM/Disk consuming DocValuesFormat for facets
> -------------------------------------------------------------
>
>                 Key: LUCENE-4764
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4764
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.2, 5.0
>
>         Attachments: LUCENE-4764.patch
>
>
> The new default DV format for binary fields has much more
> RAM-efficient encoding of the address for each document ... but it's
> also a bit slower at decode time, which affects facets because we
> decode for every collected docID.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to