Hi all,

My question concerns the method setMinimumNumberShouldMatch in BooleaQuery
class.

Lets assume that we have 3 queries (optional clauses), namely A, B, C and
we build a BooleanQuery specifying that at least 2 should match.

In terms of semantics what I understand so far is that

(A B C)~2 is equivalent to ((+A +B) (+A +C) (+B +C)).

In other words a single BooleaQuery with a min should match parameter could
be rewritten as pure disjunctive BooleanQuery comprised from 3 sub-queries.

In terms of performance it seems that the two queries present different
behavior so the minMatch property is not only syntactic sugar and
apparently there is no rewriting between the two.

Coming from the SQL world it is a bit hard for me to justify the addition
of a new operator that looks like syntactic sugar and at the same time is
more performant than the more primitive equivalents. I looked a bit in [1]
to understand motivation for adding this API but without much success.

Summing up everything to three questions:
1. Did I get right the semantics of this extra property or there are things
that I am missing?
(If my understanding is correct)
2. What's the reason for introducing the minMatch property in the first
place? (Avoid creating huge queries?)
3. Should the performance of the two queries shown above differ?

Thanks in advance!

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/LUCENE-395

Reply via email to