[ 
https://issues.apache.org/jira/browse/LUCENE-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795905#comment-16795905
 ] 

Adrien Grand commented on LUCENE-7958:
--------------------------------------

Thanks for sharing [~hermes]. I should resurrect the above patch when I have 
some time!

> Give TermInSetQuery better advancing capabilities
> -------------------------------------------------
>
>                 Key: LUCENE-7958
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7958
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7958.patch
>
>
> If a TermInSetQuery has more than 15 matching terms on a given segment, then 
> we consume all postings lists into a bitset and return an iterator over this 
> bitset as a scorer. I would like to change it so that we keep the 15 postings 
> lists that have the largest document frequencies and consume all other 
> (shorter) postings lists into a bitset. In the end we return a disjunction 
> over the N longest postings lists and the bit set. This could help consume 
> fewer doc ids if the TermInSetQuery is intersected with other queries, 
> especially if the document frequencies of the terms it wraps have a zipfian 
> distribution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to