Hi there,

For a few months, some of us have been running into issues with the cost
estimate from AbstractMultiTermQueryConstantScoreWrapper. (
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java#L300
)

In https://github.com/apache/lucene/issues/13029, the problem was raised in
terms of queries not being cached, because the estimated cost was too high.

We've also run into problems in OpenSearch, since we started wrapping
MultiTermQueries in IndexOrDocValueQuery. The MTQ gets an exaggerated cost
estimate, so IndexOrDocValueQuery decides it should be a DV query, even
though the MTQ would really only match a handful of docs (and should be
lead iterator).

I opened a PR back in March (https://github.com/apache/lucene/pull/13201)
to try to handle the case where a MultiTermQuery matches a small number of
terms. Since Mayya pulled the rewrite logic that expands up to 16 terms (to
rewrite as a Boolean disjunction) earlier in the workflow (in
https://github.com/apache/lucene/pull/13454), we get the better cost
estimate for MTQs on few terms "for free".

What do folks think?

Thanks,
Froh

Reply via email to