[ https://issues.apache.org/jira/browse/LUCENE-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand updated LUCENE-6645: --------------------------------- Attachment: LUCENE-6645.patch I played a bit with the benchmark and have similar results (1.76 sec for trunk and more than 4 sec with the patch). It's a worst case for BitDocIdSetBuilder given that it always starts to build a SparseFixedBitSet to eventually upgrade it to a FixedBitSet. But still it's disappointing that it's so slow compared to building a FixedBitSet directly. I've experimented with a more brute-force approach (see attached patch) that uses a plain int[] instead of a SparseFixedBitSet for the sparse case, and it seems to perform better: the benchmark runs in 1.76 sec on trunk and 2.70 sec with the patch if the builder is configured to use an int[] up to number of docs of maxDoc / 128. It goes down to 1.96 with a threshold of maxDoc / 2048. Maybe this is what we should use instead of BitDocIdSetBuilder? I tried to see how this affects our luceneutil benchmark and there is barely any change: {noformat} TaskQPS baseline StdDev QPS patch StdDev Pct diff Fuzzy1 74.41 (18.3%) 69.59 (19.4%) -6.5% ( -37% - 38%) LowTerm 761.39 (2.4%) 749.20 (3.6%) -1.6% ( -7% - 4%) OrNotHighLow 877.81 (2.2%) 867.60 (5.3%) -1.2% ( -8% - 6%) OrHighNotMed 76.63 (2.1%) 75.89 (2.7%) -1.0% ( -5% - 3%) MedTerm 309.75 (1.3%) 306.86 (2.6%) -0.9% ( -4% - 2%) OrHighHigh 26.86 (5.4%) 26.64 (3.3%) -0.8% ( -9% - 8%) OrNotHighHigh 67.94 (1.0%) 67.42 (2.0%) -0.8% ( -3% - 2%) HighTerm 132.28 (1.4%) 131.29 (1.7%) -0.7% ( -3% - 2%) Respell 78.71 (2.8%) 78.14 (3.2%) -0.7% ( -6% - 5%) LowPhrase 121.23 (0.8%) 120.47 (1.3%) -0.6% ( -2% - 1%) OrHighNotLow 112.94 (2.3%) 112.25 (2.5%) -0.6% ( -5% - 4%) OrNotHighMed 223.81 (2.4%) 222.52 (3.8%) -0.6% ( -6% - 5%) OrHighLow 71.79 (4.3%) 71.39 (3.3%) -0.6% ( -7% - 7%) MedSpanNear 23.33 (1.1%) 23.21 (1.8%) -0.5% ( -3% - 2%) AndHighHigh 62.01 (1.9%) 61.71 (3.6%) -0.5% ( -5% - 5%) OrHighMed 41.79 (5.5%) 41.61 (3.6%) -0.4% ( -9% - 9%) AndHighMed 90.86 (2.0%) 90.61 (2.8%) -0.3% ( -5% - 4%) HighSloppyPhrase 47.43 (4.6%) 47.33 (4.8%) -0.2% ( -9% - 9%) HighPhrase 28.36 (1.6%) 28.30 (1.3%) -0.2% ( -3% - 2%) MedPhrase 147.25 (1.4%) 146.99 (1.6%) -0.2% ( -3% - 2%) LowSloppyPhrase 37.07 (2.2%) 37.03 (2.3%) -0.1% ( -4% - 4%) MedSloppyPhrase 156.95 (3.7%) 156.80 (3.6%) -0.1% ( -7% - 7%) LowSpanNear 29.05 (2.2%) 29.02 (2.0%) -0.1% ( -4% - 4%) OrHighNotHigh 61.13 (1.5%) 61.08 (1.6%) -0.1% ( -3% - 3%) HighSpanNear 15.36 (1.7%) 15.36 (1.8%) 0.0% ( -3% - 3%) Wildcard 111.57 (3.1%) 113.05 (2.1%) 1.3% ( -3% - 6%) IntNRQ 7.49 (7.3%) 7.60 (5.2%) 1.4% ( -10% - 14%) Prefix3 72.81 (4.6%) 74.18 (4.1%) 1.9% ( -6% - 11%) AndHighLow 974.36 (3.0%) 994.46 (2.9%) 2.1% ( -3% - 8%) Fuzzy2 47.42 (16.1%) 53.71 (16.5%) 13.3% ( -16% - 54%) {noformat} I suspect this is because our multi-term queries in this benchmark match some high-frequency terms so the upgrade to a FixedBitSet happens quickly anyway. > BKD tree queries should use BitDocIdSet.Builder > ----------------------------------------------- > > Key: LUCENE-6645 > URL: https://issues.apache.org/jira/browse/LUCENE-6645 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Attachments: LUCENE-6645.patch, LUCENE-6645.patch > > > When I was iterating on BKD tree originally I remember trying to use this > builder (which makes a sparse bit set at first and then upgrades to dense if > enough bits get set) and being disappointed with its performance. > I wound up just making a FixedBitSet every time, but this is obviously > wasteful for small queries. > It could be the perf was poor because I was always .or'ing in DISIs that had > 512 - 1024 hits each time (the size of each leaf cell in the BKD tree)? I > also had to make my own DISI wrapper around each leaf cell... maybe that was > the source of the slowness, not sure. > I also sort of wondered whether the SmallDocSet in spatial module (backed by > a SentinelIntSet) might be faster ... though it'd need to be sorted in the > and after building before returning to Lucene. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org