[jira] [Updated] (LUCENE-1536) if a filter can support random access API, we should use it

Robert Muir (Updated) (JIRA) Mon, 10 Oct 2011 10:03:58 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Muir updated LUCENE-1536:
--------------------------------

    Attachment: LUCENE-1536_hack.patch

hack patch that computes the heuristic up front in weight init, so it scores 
all segments consistently and returns the proper scoresDocsOutOfOrder for BS1.

Uwe's new test (the nestedFilterQuery) doesnt pass yet, don't know why.

I recomputed the benchmarks:
{noformat}
                Task   QPS trunkStdDev trunk   QPS patchStdDev patch      Pct 
diff
          PhraseF1.0       11.99        0.20        7.79        0.23  -37% -  
-31%
            TermF0.5      135.14        7.62      116.57        0.36  -18% -   
-8%
   AndHighHighF100.0       17.34        0.78       15.44        0.15  -15% -   
-5%
    AndHighHighF95.0       17.28        0.66       15.48        0.17  -14% -   
-5%
    AndHighHighF90.0       17.31        0.76       15.58        0.19  -14% -   
-4%
    AndHighHighF99.0       17.05        1.02       15.45        0.17  -15% -   
-2%
    AndHighHighF75.0       17.47        0.78       16.03        0.15  -12% -   
-3%
     AndHighHighF5.0       20.69        0.95       19.78        0.23   -9% -    
1%
     AndHighHighF1.0       35.11        1.46       33.64        0.36   -8% -    
1%
     AndHighHighF0.1      136.04        3.70      132.00        1.41   -6% -    
0%
         AndHighHigh       18.25        0.70       17.74        0.20   -7% -    
2%
     AndHighHighF0.5       49.84        1.72       48.58        0.49   -6% -    
1%
            TermF0.1      351.18       11.01      345.85        1.73   -4% -    
2%
        Fuzzy2F100.0       95.52        4.21       94.33        2.07   -7% -    
5%
  SloppyPhraseF100.0        8.01        0.28        7.91        0.09   -5% -    
3%
         Fuzzy2F90.0       95.42        3.86       94.51        1.74   -6% -    
5%
         Fuzzy2F95.0       95.20        4.86       94.33        1.83   -7% -    
6%
          Fuzzy1F1.0       54.02        1.67       53.56        1.07   -5% -    
4%
          PhraseF2.0        7.73        0.07        7.68        0.18   -3% -    
2%
   SloppyPhraseF99.0        7.99        0.23        7.95        0.10   -4% -    
3%
    AndHighHighF50.0       17.54        0.79       17.46        0.12   -5% -    
4%
          Fuzzy2F0.1      105.39        3.93      105.34        3.74   -7% -    
7%
      SpanNearF100.0        3.16        0.06        3.16        0.04   -2% -    
2%
         Fuzzy2F99.0       94.02        6.86       94.21        1.97   -8% -   
10%
         Fuzzy2F75.0       95.56        3.51       95.76        2.02   -5% -    
6%
        WildcardF2.0       52.79        0.27       53.05        0.57   -1% -    
2%
          Fuzzy1F0.5       58.12        1.83       58.43        1.22   -4% -    
5%
          PhraseF0.1       66.34        0.78       66.73        1.68   -3% -    
4%
    SloppyPhraseF0.1       56.15        1.52       56.79        0.64   -2% -    
5%
        SloppyPhrase        8.08        0.26        8.18        0.08   -2% -    
5%
            PKLookup      176.59        5.07      178.96        5.71   -4% -    
7%
        SpanNearF0.1       32.36        0.56       32.83        0.54   -1% -    
4%
      OrHighHighF0.1       78.20        0.52       79.44        0.74    0% -    
3%
   SloppyPhraseF95.0        7.91        0.08        8.05        0.09    0% -    
3%
              Fuzzy2       94.87        3.72       96.49        1.62   -3% -    
7%
      OrHighHighF0.5       31.41        0.47       31.96        0.33    0% -    
4%
       SpanNearF99.0        3.12        0.06        3.18        0.03    0% -    
4%
        WildcardF0.5       61.97        0.56       63.28        0.82    0% -    
4%
          PhraseF0.5       19.78        0.26       20.29        0.31    0% -    
5%
            SpanNear        3.19        0.08        3.27        0.05   -1% -    
6%
        WildcardF0.1       67.45        0.64       69.24        0.89    0% -    
4%
   SloppyPhraseF90.0        8.00        0.29        8.21        0.12   -2% -    
8%
       SpanNearF95.0        3.13        0.04        3.23        0.03    1% -    
5%
            Wildcard       43.19        0.34       44.64        1.40    0% -    
7%
         Fuzzy2F50.0       95.12        4.22       98.69        2.28   -2% -   
11%
              Fuzzy1       55.28        4.53       57.68        0.76   -4% -   
15%
          OrHighHigh       12.13        0.99       12.71        0.43   -6% -   
18%
              Phrase        3.60        0.04        3.81        0.04    3% -    
7%
       SpanNearF90.0        3.15        0.05        3.35        0.04    3% -    
9%
                Term       71.69        0.40       76.53        4.13    0% -   
13%
         PhraseF99.0        3.43        0.03        3.68        0.04    5% -    
9%
        PhraseF100.0        3.39        0.05        3.67        0.04    5% -   
10%
   SloppyPhraseF75.0        8.04        0.26        8.74        0.12    3% -   
13%
         Fuzzy2F20.0       97.38        4.03      106.17        2.88    1% -   
16%
         PhraseF95.0        3.38        0.03        3.70        0.05    6% -   
11%
         PhraseF90.0        3.42        0.02        3.76        0.03    8% -   
11%
         Fuzzy2F10.0       97.19        3.69      109.27        3.23    5% -   
20%
         PhraseF75.0        3.44        0.02        3.94        0.04   12% -   
16%
          Fuzzy2F5.0       96.77        4.17      112.60        3.30    8% -   
25%
          Fuzzy1F0.1       73.61        2.43       86.22        2.79    9% -   
25%
      WildcardF100.0       35.49        0.33       41.92        1.05   14% -   
22%
       SpanNearF75.0        3.15        0.07        3.72        0.04   14% -   
22%
       WildcardF95.0       35.43        0.24       41.90        0.99   14% -   
21%
       WildcardF90.0       35.59        0.32       42.11        1.11   14% -   
22%
       WildcardF99.0       35.43        0.34       41.94        1.09   14% -   
22%
         Fuzzy1F99.0       47.41        1.79       56.45        0.78   13% -   
25%
       WildcardF75.0       35.64        0.29       42.51        0.87   15% -   
22%
        Fuzzy1F100.0       46.85        1.83       56.42        0.55   14% -   
26%
          Fuzzy2F1.0       96.75        3.75      116.85        4.32   11% -   
30%
         Fuzzy1F95.0       46.91        1.37       56.69        0.69   15% -   
25%
          Fuzzy2F0.5       97.33        4.15      117.64        4.17   11% -   
30%
          Fuzzy2F2.0       95.51        3.65      115.66        3.95   12% -   
30%
         Fuzzy1F90.0       46.84        1.95       56.83        0.78   14% -   
28%
       WildcardF50.0       36.28        0.23       44.23        0.58   19% -   
24%
            TermF1.0       93.99        4.90      114.60        0.49   15% -   
29%
         Fuzzy1F75.0       47.12        1.68       58.11        0.82   17% -   
29%
        WildcardF1.0       56.94        0.80       71.15        0.80   21% -   
28%
         PhraseF50.0        3.49        0.01        4.39        0.04   24% -   
27%
   SloppyPhraseF50.0        8.03        0.28       10.12        0.13   20% -   
32%
         Fuzzy1F50.0       46.64        2.22       60.78        0.95   22% -   
38%
      OrHighHighF1.0       24.15        0.35       32.46        0.25   31% -   
37%
       WildcardF20.0       40.55        0.30       55.72        0.73   34% -   
40%
         Fuzzy1F20.0       49.29        1.52       69.91        1.26   35% -   
48%
        WildcardF5.0       47.02        0.33       67.11        0.81   40% -   
45%
          PhraseF5.0        5.03        0.06        7.29        0.13   40% -   
49%
       WildcardF10.0       43.04        0.57       62.68        0.69   42% -   
49%
       SpanNearF50.0        3.16        0.07        4.77        0.06   45% -   
56%
    AndHighHighF20.0       17.76        0.64       27.44        0.25   47% -   
61%
         Fuzzy1F10.0       48.53        2.47       75.31        1.60   44% -   
66%
          Fuzzy1F5.0       50.45        1.70       79.01        2.08   47% -   
66%
          Fuzzy1F2.0       52.03        1.54       82.41        2.46   49% -   
68%
    OrHighHighF100.0        7.69        0.20       12.19        0.33   50% -   
67%
     OrHighHighF99.0        7.72        0.35       12.25        0.34   47% -   
70%
         PhraseF20.0        3.74        0.03        5.95        0.05   56% -   
61%
     OrHighHighF95.0        7.79        0.28       12.43        0.33   49% -   
69%
      OrHighHighF2.0       19.60        0.24       31.28        0.14   56% -   
62%
            TermF2.0       70.16        3.78      112.18        0.53   51% -   
69%
     OrHighHighF90.0        7.83        0.23       12.56        0.33   51% -   
69%
         PhraseF10.0        4.14        0.04        6.76        0.09   59% -   
66%
           TermF50.0       42.57        1.77       70.63        1.25   56% -   
76%
           TermF75.0       41.43        1.61       70.40        2.44   57% -   
82%
     OrHighHighF75.0        7.81        0.22       13.33        0.36   61% -   
80%
           TermF95.0       41.07        1.65       70.96        3.30   58% -   
88%
           TermF99.0       41.12        1.57       71.14        3.41   58% -   
88%
           TermF90.0       40.92        1.58       70.81        3.00   59% -   
87%
          TermF100.0       40.01        0.73       71.10        3.36   66% -   
89%
     OrHighHighF50.0        8.39        0.24       14.92        0.29   69% -   
86%
    AndHighHighF10.0       18.68        0.61       36.17        0.23   86% -  
101%
           TermF20.0       45.66        1.98       88.55        0.52   84% -  
103%
      OrHighHighF5.0       14.89        0.30       29.12        0.20   90% -  
100%
   SloppyPhraseF20.0        8.34        0.29       16.36        0.26   86% -  
106%
            TermF5.0       52.71        1.88      105.42        0.46   92% -  
108%
           TermF10.0       47.53        1.99       97.62        0.51   96% -  
115%
     AndHighHighF2.0       26.32        1.16       54.83        0.27   98% -  
119%
     OrHighHighF20.0       10.36        0.19       22.12        0.22  107% -  
119%
     OrHighHighF10.0       12.31        0.35       26.43        0.23  106% -  
122%
   SloppyPhraseF10.0        8.73        0.28       22.70        0.38  147% -  
172%
    SloppyPhraseF0.5       19.21        0.58       52.13        0.61  160% -  
183%
       SpanNearF20.0        3.20        0.05        8.98        0.16  171% -  
189%
    SloppyPhraseF5.0        9.30        0.33       30.17        0.44  208% -  
241%
    SloppyPhraseF1.0       13.84        0.44       46.77        0.64  223% -  
253%
    SloppyPhraseF2.0       11.00        0.36       39.85        0.54  246% -  
279%
       SpanNearF10.0        3.31        0.07       13.57        0.23  294% -  
325%
        SpanNearF0.5        9.25        0.14       39.86        0.39  320% -  
341%
        SpanNearF5.0        3.54        0.07       19.35        0.38  425% -  
468%
        SpanNearF1.0        6.15        0.11       34.27        0.48  439% -  
474%
        SpanNearF2.0        4.52        0.09       28.11        0.43  500% -  
543%
{noformat}
                
> if a filter can support random access API, we should use it
> -----------------------------------------------------------
>
>                 Key: LUCENE-1536
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1536
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: 4.0
>
>         Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536_hack.patch, changes-yonik-uwe.patch, luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
>     10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
>     means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
>     AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
>     95, 98, 99, 99.99999 (filter is non-null but all bits are set),
>     100 (filter=null, control)).
>   * Method high means I use random-access filter API in
>     IndexSearcher's main loop.  Method low means I use random-access
>     filter API down in SegmentTermDocs (just like deleted docs
>     today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
>     "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-1536) if a filter can support random access API, we should use it

Reply via email to