jpountz opened a new pull request, #1052: URL: https://github.com/apache/lucene/pull/1052
This commit adds a new `TermsEnumIndex` abstraction in `oal.index` that wraps a `TermsEnum` and an index of the segment that it belongs to, and can be used to create priority queues that merge TermsEnum instances (either from the inverted index or from doc values). In either case, a long that holds the first 8 bytes of the term is computed in order to speed up comparisons. In the doc-values case, `OrdinalMap` also leverages seek-by-ord capabilities to reason about shared prefixes across entire windows of terms to not compare shared prefixes whenever re-ordering the queue, this should especially help with fields that may share long common prefixes like URLs. On luceneutil's `OrdinalMap` benchmark, construction time reduced by 30.5% for the `id` field and by 17.5% for the `name` field. JIRA: [LUCENE-10560](https://issues.apache.org/jira/browse/LUCENE-10560) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org