It sounds like a lot of complexity to handle an unusual edge case, but
... I guess this actually happened? Can you give any sense of the
end-user behavior that caused it?

On Fri, Nov 4, 2022 at 2:26 AM Patrick Zhai <zhai7...@gmail.com> wrote:
>
> Hi Froh,
>
> The idea sounds reasonable to me, altho I wonder whether using 
> CONSTANT_SCORE_BOOLEAN_REWRITE would help with your case since that dense 
> union case should be already handled by disjunction query I suppose?
> But that boolean rewrite is subject to max clause limit so it may have some 
> other problems depending on the use case I guess.
>
> Best
> Patrick
>
>
> On Thu, Nov 3, 2022 at 5:15 PM Michael Froh <msf...@gmail.com> wrote:
>>
>> Hi,
>>
>> I was recently poking around in the createWeight implementation for 
>> MultiTermQueryConstantScoreWrapper to get to the bottom of some slow 
>> queries, and I realized that the worst-case performance could be pretty bad, 
>> but (maybe) possible to optimize for.
>>
>> Imagine if we have a segment with N docs and our MultiTermQuery expands to 
>> hit M terms, where each of the M terms matches N-1 docs. (If we matched all 
>> N docs, then Greg Miller's recent optimization to replace the MultiTermQuery 
>> with a TermQuery would kick in.) In this case, my understanding is that we 
>> would iterate through all the terms and pass each one's postings to a 
>> DocIdSetBuilder, which will iterate through the postings to set bits. This 
>> whole thing would be O(MN), I think.
>>
>> I was thinking that it would be cool if the DocIdSetBuilder could detect 
>> long runs of set bits and advance() each DISI to skip over them (since 
>> they're guaranteed not to contribute anything new to the union). In the 
>> worst case that I described above, I think it would make the whole thing O(M 
>> log N) (assuming advance() takes log time).
>>
>> At the risk of overcomplicating things, maybe DocIdSetBuilder could use a 
>> third ("dense") BulkAdder implementation that kicks in once enough bits are 
>> set, to efficiently implement the "or" operation to skip over known 
>> (sufficiently long) runs of set bits?
>>
>> Would something like that be useful? Is the "dense union of doc IDs" case 
>> common enough to warrant it?
>>
>> Thanks,
>> Froh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to