[ 
https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4599:
---------------------------------

    Attachment: highlightNoStop.tasks

I ran the highlight tasks from luceneutil (I had to remove stop words first, 
see attached tasks file). The index contains 500k docs from wikibig and fully 
fits in the O/S cache.

{noformat}
                    TaskQPS Lucene40      StdDevQPS Compressing      StdDev     
           Pct diff
                 LowTerm      411.77      (3.8%)      268.23      (2.1%)  
-34.9% ( -39% -  -30%)
                 MedTerm      352.85      (5.2%)      242.88      (3.8%)  
-31.2% ( -38% -  -23%)
                HighTerm      231.72      (6.7%)      177.52      (5.8%)  
-23.4% ( -33% -  -11%)
               LowPhrase      226.94      (3.7%)      177.07      (1.9%)  
-22.0% ( -26% -  -16%)
              AndHighMed      136.94      (2.2%)      119.63      (1.8%)  
-12.6% ( -16% -   -8%)
               OrHighLow      124.73      (5.2%)      111.01      (3.6%)  
-11.0% ( -18% -   -2%)
               OrHighMed       82.16      (6.8%)       75.91      (4.7%)   
-7.6% ( -17% -    4%)
              OrHighHigh       73.41      (5.9%)       68.18      (6.1%)   
-7.1% ( -18% -    5%)
               MedPhrase       34.41      (5.5%)       32.28      (8.1%)   
-6.2% ( -18% -    7%)
              HighPhrase       44.10      (3.9%)       41.47      (4.9%)   
-6.0% ( -14% -    2%)
         LowSloppyPhrase       36.07      (4.8%)       33.93      (5.4%)   
-5.9% ( -15% -    4%)
             AndHighHigh       49.13      (2.6%)       46.27      (2.2%)   
-5.8% ( -10% -   -1%)
         MedSloppyPhrase       12.84      (3.8%)       12.20      (6.2%)   
-5.0% ( -14% -    5%)
        HighSloppyPhrase       13.20      (4.6%)       12.58      (6.5%)   
-4.7% ( -15% -    6%)
             LowSpanNear        7.94     (10.7%)        7.69      (8.2%)   
-3.2% ( -19% -   17%)
            HighSpanNear        5.32      (3.6%)        5.24      (4.4%)   
-1.6% (  -9% -    6%)
              AndHighLow     3780.85      (4.0%)     3756.77      (7.5%)   
-0.6% ( -11% -   11%)
                PKLookup      341.94      (2.2%)      340.85      (2.4%)   
-0.3% (  -4% -    4%)
                 Prefix3      122.64      (3.8%)      122.60      (4.3%)   
-0.0% (  -7% -    8%)
                Wildcard      188.27      (3.2%)      188.23      (3.2%)   
-0.0% (  -6% -    6%)
                  IntNRQ      136.55      (7.2%)      137.57      (7.4%)    
0.7% ( -12% -   16%)
             MedSpanNear       40.54      (6.0%)       40.94      (6.4%)    
1.0% ( -10% -   14%)
                 Respell       58.13      (4.0%)       59.35      (3.5%)    
2.1% (  -5% -    9%)
                  Fuzzy2       55.24      (5.7%)       57.72      (7.8%)    
4.5% (  -8% -   19%)
                  Fuzzy1       76.40      (5.9%)       83.67      (4.0%)    
9.5% (   0% -   20%)
{noformat}

Results are disappointing, and I'm surprised that some queries perform much 
worse while other ones perform better. I'll dig but if someone has an idea why 
(I'm not familiar at all with FastVectorHighlighter), I'm interested to know 
his/her theory!
                
> Compressed term vectors
> -----------------------
>
>                 Key: LUCENE-4599
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4599
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: core/codecs, core/termvectors
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 4.2
>
>         Attachments: 4599-dataimport-fail.log, 4599-zookeer-fail.log, 
> CompressingTVF_ingest_rate.png, highlightNoStop.tasks, 
> Lucene40TVF_ingest_rate.png, LUCENE-4599.patch, LUCENE-4599.patch, 
> LUCENE-4599.patch, solr.patch
>
>
> We should have codec-compressed term vectors similarly to what we have with 
> stored fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to