[jira] [Updated] (LUCENE-6645) BKD tree queries should use BitDocIdSet.Builder

Adrien Grand (JIRA) Fri, 03 Jul 2015 05:14:33 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-6645:
---------------------------------
    Attachment: LUCENE-6645.patch

I played a bit with the benchmark and have similar results (1.76 sec for trunk 
and more than 4 sec with the patch). It's a worst case for BitDocIdSetBuilder 
given that it always starts to build a SparseFixedBitSet to eventually upgrade 
it to a FixedBitSet. But still it's disappointing that it's so slow compared to 
building a FixedBitSet directly.

I've experimented with a more brute-force approach (see attached patch) that 
uses a plain int[] instead of a SparseFixedBitSet for the sparse case, and it 
seems to perform better: the benchmark runs in 1.76 sec on trunk and 2.70 sec 
with the patch if the builder is configured to use an int[] up to number of 
docs of maxDoc / 128. It goes down to 1.96 with a threshold of maxDoc / 2048.  
Maybe this is what we should use instead of BitDocIdSetBuilder?

I tried to see how this affects our luceneutil benchmark and there is barely 
any change:

{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff
                  Fuzzy1       74.41     (18.3%)       69.59     (19.4%)   
-6.5% ( -37% -   38%)
                 LowTerm      761.39      (2.4%)      749.20      (3.6%)   
-1.6% (  -7% -    4%)
            OrNotHighLow      877.81      (2.2%)      867.60      (5.3%)   
-1.2% (  -8% -    6%)
            OrHighNotMed       76.63      (2.1%)       75.89      (2.7%)   
-1.0% (  -5% -    3%)
                 MedTerm      309.75      (1.3%)      306.86      (2.6%)   
-0.9% (  -4% -    2%)
              OrHighHigh       26.86      (5.4%)       26.64      (3.3%)   
-0.8% (  -9% -    8%)
           OrNotHighHigh       67.94      (1.0%)       67.42      (2.0%)   
-0.8% (  -3% -    2%)
                HighTerm      132.28      (1.4%)      131.29      (1.7%)   
-0.7% (  -3% -    2%)
                 Respell       78.71      (2.8%)       78.14      (3.2%)   
-0.7% (  -6% -    5%)
               LowPhrase      121.23      (0.8%)      120.47      (1.3%)   
-0.6% (  -2% -    1%)
            OrHighNotLow      112.94      (2.3%)      112.25      (2.5%)   
-0.6% (  -5% -    4%)
            OrNotHighMed      223.81      (2.4%)      222.52      (3.8%)   
-0.6% (  -6% -    5%)
               OrHighLow       71.79      (4.3%)       71.39      (3.3%)   
-0.6% (  -7% -    7%)
             MedSpanNear       23.33      (1.1%)       23.21      (1.8%)   
-0.5% (  -3% -    2%)
             AndHighHigh       62.01      (1.9%)       61.71      (3.6%)   
-0.5% (  -5% -    5%)
               OrHighMed       41.79      (5.5%)       41.61      (3.6%)   
-0.4% (  -9% -    9%)
              AndHighMed       90.86      (2.0%)       90.61      (2.8%)   
-0.3% (  -5% -    4%)
        HighSloppyPhrase       47.43      (4.6%)       47.33      (4.8%)   
-0.2% (  -9% -    9%)
              HighPhrase       28.36      (1.6%)       28.30      (1.3%)   
-0.2% (  -3% -    2%)
               MedPhrase      147.25      (1.4%)      146.99      (1.6%)   
-0.2% (  -3% -    2%)
         LowSloppyPhrase       37.07      (2.2%)       37.03      (2.3%)   
-0.1% (  -4% -    4%)
         MedSloppyPhrase      156.95      (3.7%)      156.80      (3.6%)   
-0.1% (  -7% -    7%)
             LowSpanNear       29.05      (2.2%)       29.02      (2.0%)   
-0.1% (  -4% -    4%)
           OrHighNotHigh       61.13      (1.5%)       61.08      (1.6%)   
-0.1% (  -3% -    3%)
            HighSpanNear       15.36      (1.7%)       15.36      (1.8%)    
0.0% (  -3% -    3%)
                Wildcard      111.57      (3.1%)      113.05      (2.1%)    
1.3% (  -3% -    6%)
                  IntNRQ        7.49      (7.3%)        7.60      (5.2%)    
1.4% ( -10% -   14%)
                 Prefix3       72.81      (4.6%)       74.18      (4.1%)    
1.9% (  -6% -   11%)
              AndHighLow      974.36      (3.0%)      994.46      (2.9%)    
2.1% (  -3% -    8%)
                  Fuzzy2       47.42     (16.1%)       53.71     (16.5%)   
13.3% ( -16% -   54%)
{noformat}

I suspect this is because our multi-term queries in this benchmark match some 
high-frequency terms so the upgrade to a FixedBitSet happens quickly anyway.

> BKD tree queries should use BitDocIdSet.Builder
> -----------------------------------------------
>
>                 Key: LUCENE-6645
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6645
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: LUCENE-6645.patch, LUCENE-6645.patch
>
>
> When I was iterating on BKD tree originally I remember trying to use this 
> builder (which makes a sparse bit set at first and then upgrades to dense if 
> enough bits get set) and being disappointed with its performance.
> I wound up just making a FixedBitSet every time, but this is obviously 
> wasteful for small queries.
> It could be the perf was poor because I was always .or'ing in DISIs that had 
> 512 - 1024 hits each time (the size of each leaf cell in the BKD tree)?  I 
> also had to make my own DISI wrapper around each leaf cell... maybe that was 
> the source of the slowness, not sure.
> I also sort of wondered whether the SmallDocSet in spatial module (backed by 
> a SentinelIntSet) might be faster ... though it'd need to be sorted in the 
> and after building before returning to Lucene.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6645) BKD tree queries should use BitDocIdSet.Builder

Reply via email to