[GitHub] [lucene] jpountz opened a new pull request, #12055: Better skipping for multi-term queries with a FILTER rewrite.

GitBox Sun, 01 Jan 2023 03:38:56 -0800


jpountz opened a new pull request, #12055:
URL: https://github.com/apache/lucene/pull/12055


   Currently multi-term queries with a filter rewrite internally rewrite to a 
disjunction if 16 terms or less match the query. Otherwise postings lists of 
matching terms are collected into a `DocIdSetBuilder`. This change replaces the 
latter with a mixed approach where a disjunction is created between the 16 
terms that have the highest document frequency and an iterator produced from 
the `DocIdSetBuilder` that collects all other terms. On fields that have a 
zipfian distribution, it's quite likely that no high-frequency terms make it to 
the `DocIdSetBuilder`. This provides two main benefits:
    - Queries are less likely to allocate a FixedBitSet of size `maxDoc`.
    - Queries are better at skipping or early terminating. On the other hand, 
queries that need to consume most or all matching documents may get a slowdown.
   
   The slowdown is unfortunate, but my gut feeling is that this change still 
has more pros than cons.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz opened a new pull request, #12055: Better skipping for multi-term queries with a FILTER rewrite.

Reply via email to