Hi there, For a few months, some of us have been running into issues with the cost estimate from AbstractMultiTermQueryConstantScoreWrapper. ( https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java#L300 )
In https://github.com/apache/lucene/issues/13029, the problem was raised in terms of queries not being cached, because the estimated cost was too high. We've also run into problems in OpenSearch, since we started wrapping MultiTermQueries in IndexOrDocValueQuery. The MTQ gets an exaggerated cost estimate, so IndexOrDocValueQuery decides it should be a DV query, even though the MTQ would really only match a handful of docs (and should be lead iterator). I opened a PR back in March (https://github.com/apache/lucene/pull/13201) to try to handle the case where a MultiTermQuery matches a small number of terms. Since Mayya pulled the rewrite logic that expands up to 16 terms (to rewrite as a Boolean disjunction) earlier in the workflow (in https://github.com/apache/lucene/pull/13454), we get the better cost estimate for MTQs on few terms "for free". What do folks think? Thanks, Froh