[jira] [Updated] (LUCENE-6066) Collector that manages diversity in search results

Adrien Grand (JIRA) Mon, 09 Feb 2015 08:12:00 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-6066:
---------------------------------
    Attachment: LUCENE-6066.patch

Hi Mark,

I played with your patch to see if removing the code duplication of 
PriorityQueue would hurt the benchmark and everything looks ok:

{code}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
         LowSloppyPhrase       69.77      (4.5%)       69.25      (3.8%)   
-0.7% (  -8% -    7%)
                PKLookup      259.92      (3.2%)      258.50      (2.0%)   
-0.5% (  -5% -    4%)
        HighSloppyPhrase       13.96      (5.1%)       13.92      (4.8%)   
-0.3% (  -9% -   10%)
            OrNotHighLow     1135.87      (6.6%)     1132.89      (5.5%)   
-0.3% ( -11% -   12%)
              AndHighLow     1075.94      (5.2%)     1073.63      (4.6%)   
-0.2% (  -9% -   10%)
               LowPhrase      124.58      (2.0%)      124.49      (1.8%)   
-0.1% (  -3% -    3%)
               MedPhrase       78.58      (1.5%)       78.56      (1.6%)   
-0.0% (  -3% -    3%)
                 Prefix3       77.58      (4.7%)       77.59      (3.6%)    
0.0% (  -7% -    8%)
              HighPhrase       14.14      (1.4%)       14.16      (1.6%)    
0.1% (  -2% -    3%)
              AndHighMed      248.72      (3.9%)      249.23      (3.5%)    
0.2% (  -6% -    7%)
                  Fuzzy1       72.16      (5.6%)       72.32      (6.2%)    
0.2% ( -10% -   12%)
                HighTerm       71.70      (5.3%)       71.91      (5.1%)    
0.3% (  -9% -   11%)
               OrHighLow       68.70      (5.4%)       68.91      (5.7%)    
0.3% ( -10% -   11%)
                 MedTerm      220.94      (5.8%)      221.62      (5.4%)    
0.3% ( -10% -   12%)
                Wildcard       20.86      (1.6%)       20.92      (1.4%)    
0.3% (  -2% -    3%)
             LowSpanNear       16.46      (2.3%)       16.51      (2.3%)    
0.3% (  -4% -    5%)
             MedSpanNear       18.46      (2.4%)       18.52      (2.1%)    
0.3% (  -3% -    4%)
                  IntNRQ        6.63      (4.0%)        6.65      (4.0%)    
0.4% (  -7% -    8%)
           OrHighNotHigh       38.52      (1.7%)       38.65      (1.5%)    
0.4% (  -2% -    3%)
           OrNotHighHigh       79.04      (2.3%)       79.33      (1.8%)    
0.4% (  -3% -    4%)
            OrHighNotMed       52.77      (1.9%)       52.97      (1.5%)    
0.4% (  -2% -    3%)
         MedSloppyPhrase       44.24      (2.9%)       44.42      (2.5%)    
0.4% (  -4% -    5%)
               OrHighMed       47.19      (5.2%)       47.37      (5.4%)    
0.4% (  -9% -   11%)
            OrHighNotLow       85.13      (2.7%)       85.48      (2.1%)    
0.4% (  -4% -    5%)
              OrHighHigh       26.42      (5.1%)       26.55      (5.0%)    
0.5% (  -9% -   11%)
             AndHighHigh       84.14      (3.6%)       84.67      (3.0%)    
0.6% (  -5% -    7%)
            HighSpanNear       50.80      (1.8%)       51.18      (1.4%)    
0.7% (  -2% -    4%)
                  Fuzzy2       38.02      (8.4%)       38.54      (7.5%)    
1.3% ( -13% -   18%)
                 LowTerm     1395.69      (8.9%)     1420.90      (8.4%)    
1.8% ( -14% -   20%)
            OrNotHighMed      310.39      (4.4%)      316.65      (3.8%)    
2.0% (  -5% -   10%)
                 Respell       82.66      (4.7%)       84.39      (4.4%)    
2.1% (  -6% -   11%)
{code}

I attached the patch that I tested with.

+1 to commit

> Collector that manages diversity in search results
> --------------------------------------------------
>
>                 Key: LUCENE-6066
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6066
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/query/scoring
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: LUCENE-6066.patch, LUCENE-PQRemoveV8.patch, 
> LUCENE-PQRemoveV9.patch
>
>
> This issue provides a new collector for situations where a client doesn't 
> want more than N matches for any given key (e.g. no more than 5 products from 
> any one retailer in a marketplace). In these circumstances a document that 
> was previously thought of as competitive during collection has to be removed 
> from the final PQ and replaced with another doc (eg a retailer who already 
> has 5 matches in the PQ receives a 6th match which is better than his 
> previous ones). This requires a new remove method on the existing 
> PriorityQueue class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-6066) Collector that manages diversity in search results

Reply via email to