Adrien Grand created LUCENE-7958:
------------------------------------

             Summary: Give TermInSetQuery better advancing capabilities
                 Key: LUCENE-7958
                 URL: https://issues.apache.org/jira/browse/LUCENE-7958
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Adrien Grand
            Priority: Minor


If a TermInSetQuery has more than 15 matching terms on a given segment, then we 
consume all postings lists into a bitset and return an iterator over this 
bitset as a scorer. I would like to change it so that we keep the 15 postings 
lists that have the largest document frequencies and consume all other 
(shorter) postings lists into a bitset. In the end we return a disjunction over 
the N longest postings lists and the bit set. This could help consume fewer doc 
ids if the TermInSetQuery is intersected with other queries, especially if the 
document frequencies of the terms it wraps have a zipfian distribution.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to