[ https://issues.apache.org/jira/browse/LUCENE-7330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand updated LUCENE-7330: --------------------------------- Attachment: LUCENE-7330.patch Here is a patch. It speeds up conjunctions thanks to 2 changes: First it removes the 'if (doc == NO_MORE_DOCS) return NO_MORE_DOCS;' at the top of doNext(). This was needed because TwoPhaseConjunctionDISI extended ConjunctionDISI and it is illegal to call TwoPhaseIterator.matches() on NO_MORE_DOCS. I had to refactor a bit how the two-phase iterator is exposed but I don't think it makes things more complicated. Second, it adds a special case for the second least costly iterator so that we do not have to check whether it is already on the same document as the 'lead'. If you look at the impl of doNext, we currently have to protect the call to 'other.advance()' under a 'if (other.docID() < doc)', but we can actually avoid it for the 2nd least costly iterator without changing the order in which iterators are invoked. luceneutil reports the following numbers on wikimedium10m, there seems to be a noticeable gain for conjunction-based queries (And*, *Span* and *Phrase): {noformat} TaskQPS baseline StdDev QPS patch StdDev Pct diff OrHighNotLow 128.17 (9.1%) 126.04 (8.6%) -1.7% ( -17% - 17%) OrHighHigh 14.75 (6.5%) 14.54 (5.8%) -1.4% ( -12% - 11%) OrHighMed 66.53 (6.2%) 65.65 (5.8%) -1.3% ( -12% - 11%) OrHighLow 85.42 (7.3%) 84.51 (6.7%) -1.1% ( -14% - 13%) Fuzzy1 68.08 (10.9%) 67.37 (10.2%) -1.0% ( -19% - 22%) OrHighNotMed 133.66 (8.5%) 132.33 (7.5%) -1.0% ( -15% - 16%) OrHighNotHigh 64.83 (4.6%) 64.36 (4.3%) -0.7% ( -9% - 8%) OrNotHighLow 1150.80 (3.1%) 1144.91 (3.4%) -0.5% ( -6% - 6%) Fuzzy2 61.60 (22.2%) 61.31 (14.0%) -0.5% ( -30% - 46%) OrNotHighHigh 22.30 (2.7%) 22.23 (2.6%) -0.3% ( -5% - 5%) OrNotHighMed 155.90 (2.4%) 155.74 (2.7%) -0.1% ( -5% - 5%) Respell 94.52 (1.9%) 94.69 (1.9%) 0.2% ( -3% - 4%) Wildcard 66.04 (4.6%) 66.50 (4.4%) 0.7% ( -7% - 10%) Prefix3 104.62 (4.7%) 105.54 (4.3%) 0.9% ( -7% - 10%) HighTerm 98.37 (5.3%) 99.65 (4.5%) 1.3% ( -8% - 11%) AndHighLow 612.09 (3.0%) 620.90 (2.6%) 1.4% ( -3% - 7%) MedTerm 237.97 (4.9%) 241.93 (4.4%) 1.7% ( -7% - 11%) IntNRQ 18.72 (9.4%) 19.05 (7.7%) 1.7% ( -13% - 20%) LowSloppyPhrase 108.80 (1.7%) 111.16 (2.2%) 2.2% ( -1% - 6%) MedPhrase 100.85 (2.2%) 103.08 (2.1%) 2.2% ( -2% - 6%) MedSpanNear 71.08 (2.2%) 73.09 (2.2%) 2.8% ( -1% - 7%) LowTerm 623.38 (9.5%) 641.55 (7.7%) 2.9% ( -12% - 22%) HighPhrase 35.36 (3.2%) 36.42 (3.0%) 3.0% ( -3% - 9%) LowSpanNear 92.47 (2.9%) 95.41 (2.8%) 3.2% ( -2% - 9%) HighSloppyPhrase 31.99 (4.9%) 33.09 (4.8%) 3.5% ( -5% - 13%) AndHighMed 223.42 (1.6%) 231.21 (1.9%) 3.5% ( 0% - 7%) MedSloppyPhrase 43.07 (2.5%) 45.13 (2.2%) 4.8% ( 0% - 9%) HighSpanNear 28.57 (2.9%) 29.95 (3.6%) 4.8% ( -1% - 11%) AndHighHigh 74.55 (1.0%) 78.39 (1.6%) 5.2% ( 2% - 7%) LowPhrase 19.97 (2.5%) 21.04 (2.9%) 5.4% ( 0% - 10%) {noformat} > Speed up conjunctions > --------------------- > > Key: LUCENE-7330 > URL: https://issues.apache.org/jira/browse/LUCENE-7330 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Assignee: Adrien Grand > Priority: Minor > Attachments: LUCENE-7330.patch > > > I am digging into some performance regressions between 4.x and 5.x which seem > to be due to how we always run conjunctions with ConjunctionDISI now while > 4.x had FilteredQuery, which was optimized for the case that there are only > two clauses or that one of the clause supports random access. I'd like to > explore the former in this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org