On Mon, Apr 3, 2017 at 6:25 PM, Adrien Grand <jpou...@gmail.com> wrote:
> Large boolean queries can cause a lot of random access as each sub clause
> is advanced one after the other. Even in the case that everything fits in
> the filesystem cache, the fact that the heap needs to be rebalanced after
> each documents makes queries on many clauses slow. This is why we have
> TermInSetQuery (TermsQuery on 6.x): it has a more disk-friendly access
> pattern (1 seek per term per segment) and scales better with the number of
> terms. Unfortunately it does not only come with benefits and its main
> drawback is that it is always evaluated againts the entire index. So if you
> intersect a very selective query (on an id field for instance) with a large
> TermInSetQuery, the TermInSetQuery will dominate the execution time for
> sure.

One such case which we do have is searching on file digests, where all
the values are spread across the entire index, and the common prefixes
don't allow much of a win from things like automata. For those,
though, TermsQuery might still work.

The problem is more things like word lists, where one "word" might
analyse to multiple terms, making a phrase query - which prevents
using TermsQuery. Collapsing it to some kind of conditional
multi-phrase query... yeah, I have no idea whether there is any
sensible way to do it.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to