Re: Searching within very large subset of documents

Adrien Grand Mon, 04 Aug 2025 23:27:05 -0700

Hi Thomas,

Your question suggests that you are creating a huge BooleanQuery to
identify these documents. A TermInSetQuery should perform better.


Doing better would require to better understand what you are trying to
achieve. For instance if you end up with such a large list of terms because
you're trying to evaluate a join, you may want to look at Lucene's support
for suery-time joins:
https://lucene.apache.org/core/10_1_0/join/org/apache/lucene/search/join/package-summary.html#query-time-joins-heading

Le mar. 5 août 2025, 05:48, Thomas Barr <[email protected]> a écrit :

> I have a medium-sized (~10m) Lucene index and I frequently want to
> repeatedly search within a subset of around ~100k documents. I can increase
> MaxClauseCount and build up a huge TermQuery, keep that around, then build
> a BooleanQuery out of the result at runtime, but the resulting query is
> quite slow. The now deprecated Filter would have been a good option with a
> BitSet, but that’s deprecated.
>
> Any thoughts on the best way to do this?
>
> Thanks!
> -twb
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
> Adrien

Re: Searching within very large subset of documents

Reply via email to