[ https://issues.apache.org/jira/browse/LUCENE-7462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588818#comment-15588818 ]
Michael McCandless commented on LUCENE-7462: -------------------------------------------- I also see good speedups to the otherwise "lightweight" queries: {noformat} Report after iter 19: Task QPS base StdDev QPS comp StdDev Pct diff Prefix3 43.40 (5.2%) 42.48 (8.8%) -2.1% ( -15% - 12%) IntNRQ 10.05 (8.8%) 9.87 (10.5%) -1.8% ( -19% - 19%) HighSpanNear 19.38 (5.2%) 19.14 (6.6%) -1.2% ( -12% - 11%) LowPhrase 19.34 (1.9%) 19.21 (3.6%) -0.7% ( -6% - 4%) PKLookup 350.45 (1.3%) 348.51 (2.8%) -0.6% ( -4% - 3%) MedSpanNear 41.12 (4.5%) 40.98 (4.7%) -0.4% ( -9% - 9%) Fuzzy1 115.35 (2.3%) 115.06 (2.8%) -0.2% ( -5% - 5%) LowSpanNear 85.93 (2.1%) 85.78 (2.3%) -0.2% ( -4% - 4%) MedPhrase 77.08 (2.7%) 77.03 (2.9%) -0.1% ( -5% - 5%) Respell 62.22 (2.2%) 62.26 (1.4%) 0.1% ( -3% - 3%) Wildcard 37.39 (4.4%) 37.43 (5.8%) 0.1% ( -9% - 10%) Fuzzy2 100.18 (2.0%) 100.31 (1.6%) 0.1% ( -3% - 3%) LowSloppyPhrase 14.75 (4.9%) 14.79 (4.2%) 0.2% ( -8% - 9%) HighPhrase 3.81 (5.2%) 3.82 (6.2%) 0.4% ( -10% - 12%) AndHighLow 912.50 (2.5%) 916.11 (3.8%) 0.4% ( -5% - 6%) OrNotHighLow 957.24 (2.5%) 963.91 (2.7%) 0.7% ( -4% - 6%) MedSloppyPhrase 48.46 (4.8%) 48.80 (4.3%) 0.7% ( -8% - 10%) AndHighMed 46.40 (1.7%) 46.87 (1.6%) 1.0% ( -2% - 4%) AndHighHigh 43.36 (1.9%) 43.80 (1.9%) 1.0% ( -2% - 4%) LowTerm 449.83 (2.5%) 454.76 (5.1%) 1.1% ( -6% - 8%) HighSloppyPhrase 16.13 (6.8%) 16.34 (6.3%) 1.3% ( -11% - 15%) OrNotHighMed 98.19 (3.2%) 99.56 (3.1%) 1.4% ( -4% - 7%) OrNotHighHigh 21.69 (4.5%) 22.16 (4.8%) 2.2% ( -6% - 12%) OrHighNotHigh 18.16 (7.7%) 18.75 (8.0%) 3.2% ( -11% - 20%) OrHighNotMed 61.81 (9.4%) 64.27 (9.5%) 4.0% ( -13% - 25%) MedTerm 123.87 (4.5%) 129.22 (3.3%) 4.3% ( -3% - 12%) OrHighNotLow 25.19 (11.2%) 26.28 (11.5%) 4.4% ( -16% - 30%) OrHighHigh 12.29 (7.4%) 12.96 (8.7%) 5.5% ( -9% - 23%) OrHighMed 12.36 (7.4%) 13.09 (8.5%) 5.9% ( -9% - 23%) HighTerm 38.51 (5.7%) 40.80 (4.4%) 5.9% ( -3% - 17%) OrHighLow 19.42 (8.6%) 20.66 (9.7%) 6.4% ( -10% - 26%) {noformat} > Faster search APIs for doc values > --------------------------------- > > Key: LUCENE-7462 > URL: https://issues.apache.org/jira/browse/LUCENE-7462 > Project: Lucene - Core > Issue Type: Improvement > Affects Versions: master (7.0) > Reporter: Adrien Grand > Priority: Minor > Attachments: LUCENE-7462-advanceExact.patch > > > While the iterator API helps deal with sparse doc values more efficiently, it > also makes search-time operations more costly. For instance, the old > random-access API allowed to compute facets on a given segment without any > conditionals, by just incrementing the counter at index {{ordinal+1}} while > the new API requires to advance the iterator if necessary and then check > whether it is exactly on the right document or not. > Since it is very common for fields to exist across most documents, I suspect > codecs will keep an internal structure that is similar to the current codec > in the dense case, by having a dense representation of the data and just > making the iterator skip over the minority of documents that do not have a > value. > I suggest that we add APIs that make things cheaper at search time. For > instance in the case of SORTED doc values, it could look like > {{LegacySortedDocValues}} with the additional restriction that documents can > only be consumed in order. Codecs that can implement this API efficiently > would hide it behind a {{SortedDocValues}} adapter, and then at search time > facets and comparators (which liked the {{LegacySortedDocValues}} API better) > would either unwrap or hide the SortedDocValues they got behind a more > random-access API (which would only happen in the truly sparse case if the > codec optimizes the dense case). > One challenge is that we already use the same idea for hiding single-valued > impls behind multi-valued impls, so we would need to enforce the order in > which the wrapping needs to happen. At first sight, it seems that it would be > best to do the single-value-behind-multi-value-API wrapping above the > random-access-behind-iterator-API wrapping. The complexity of > wrapping/unwrapping in the right order could be contained in the > {{DocValues}} helper class. > I think this change would also simplify search-time consumption of doc > values, which currently needs to spend several lines of code positioning the > iterator everytime it needs to do something interesting with doc values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org