[ https://issues.apache.org/jira/browse/LUCENE-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439805#comment-16439805 ]
Adrien Grand commented on LUCENE-8142: -------------------------------------- I gave this a try. {{ImpactsEnum}} has a new method {{getImpacts}} that returns impacts on multiple levels. It makes it naturally implemented by a skip list. It might make it more challenging to back this information by another data-structure, but it also has API benefits, like removing references to {{SimScorer}} from {{TermsEnum.impacts}}. wikibigall gives an improvement to term queries since this change allows term queries to skip at any level while they could only do it on the first level before. However the fact that the API is a bit more heavy seems to incur a slight slow down to conjunctions/disjunctions. I don't think it is an issue, especially because this change improves testing by allowing to better compare impacts against indexed data. Also this API means that we can now speed up queries that merge frequencies and norms rather than scores like {{SynonymQuery}} and {{BlendedTermQuery}}, which was not possible before. {noformat} AndHighHigh 83.36 (3.8%) 79.45 (3.1%) -4.7% ( -11% - 2%) OrHighHigh 34.42 (2.7%) 32.93 (2.0%) -4.3% ( -8% - 0%) AndHighMed 115.73 (3.3%) 111.67 (3.0%) -3.5% ( -9% - 2%) OrHighMed 24.44 (3.3%) 23.74 (2.1%) -2.9% ( -8% - 2%) OrHighLow 1952.31 (4.7%) 1912.93 (3.6%) -2.0% ( -9% - 6%) AndHighLow 1837.61 (4.1%) 1802.22 (3.9%) -1.9% ( -9% - 6%) Fuzzy1 229.31 (9.8%) 226.03 (8.9%) -1.4% ( -18% - 19%) IntNRQ 31.75 (14.0%) 31.36 (12.5%) -1.2% ( -24% - 29%) Fuzzy2 194.10 (9.6%) 192.36 (11.6%) -0.9% ( -20% - 22%) MedSloppyPhrase 54.96 (4.7%) 54.62 (4.2%) -0.6% ( -9% - 8%) HighSloppyPhrase 6.21 (5.9%) 6.18 (5.7%) -0.5% ( -11% - 11%) LowSloppyPhrase 19.26 (4.4%) 19.19 (4.3%) -0.4% ( -8% - 8%) HighTermMonthSort 180.22 (9.8%) 179.53 (10.4%) -0.4% ( -18% - 21%) Wildcard 60.86 (6.0%) 60.63 (6.3%) -0.4% ( -11% - 12%) Prefix3 88.19 (8.3%) 87.89 (8.5%) -0.3% ( -15% - 17%) Respell 195.14 (2.1%) 194.57 (2.5%) -0.3% ( -4% - 4%) HighPhrase 54.69 (1.6%) 54.72 (1.6%) 0.1% ( -3% - 3%) MedPhrase 41.52 (1.8%) 41.56 (1.9%) 0.1% ( -3% - 3%) LowPhrase 55.59 (1.8%) 55.68 (1.9%) 0.2% ( -3% - 3%) MedSpanNear 28.55 (3.8%) 28.74 (3.8%) 0.7% ( -6% - 8%) HighSpanNear 16.88 (4.6%) 17.03 (4.6%) 0.9% ( -7% - 10%) LowSpanNear 14.50 (6.3%) 14.67 (6.2%) 1.1% ( -10% - 14%) HighTermDayOfYearSort 61.22 (12.3%) 62.04 (12.4%) 1.3% ( -20% - 29%) LowTerm 2478.52 (4.1%) 2692.79 (4.0%) 8.6% ( 0% - 17%) MedTerm 835.85 (5.8%) 1323.83 (6.8%) 58.4% ( 43% - 75%) HighTerm 472.60 (6.8%) 1718.45 (15.6%) 263.6% ( 225% - 306%) {noformat} > Should codecs expose raw impacts? > --------------------------------- > > Key: LUCENE-8142 > URL: https://issues.apache.org/jira/browse/LUCENE-8142 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Attachments: LUCENE-8142.patch > > > Follow-up of LUCENE-4198. Currently, call-sites of TermsEnum.impacts provide > a SimScorer so that the maximum score for the block can be computed. Should > ImpactsEnum instead return the (freq,norm) pairs and let callers deal with > max score computation? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org