I've got a problem with a query running against Lucene 7.3 where the
boolean AND is not being applied.

The fields involved are:
1.  _missing_, which contains a token for each field missing in the document
2. phoneNationalNumberQueryNgrams, which contains a phone number. The index
analyzer is keyword, and the query analyzer is a right edge ngram with a
minimum length of 6
(There are other fields, but this is the minimal reproducer)

I have in the index one document containing:
_missing_ = phoneCountryCode
phoneNationalNumberQueryNgrams = "987654321"

These queries find the document as expected:
1. _missing_:phoneCountryCode
2. phoneNationalNumberQueryNgrams(987654321)
3. phoneNationalNumberQueryNgrams(0987654321)
4.   _missing_:phoneCountryCode AND
phoneNationalNumberQueryNgrams:(987654321)

These queries fail to find the document as expected:
1. phoneNationalNumberQueryNgrams(87654321)
2. phoneNationalNumberQueryNgrams(54321)
3.   _missing_:phoneCountryCode AND
phoneNationalNumberQueryNgrams:(87654321)
4.   _missing_:phoneCountryCode AND phoneNationalNumberQueryNgrams:(54321)

However this query finds the document:
(Note this is the boolean AND of one term that does match the document and
one that does not)
1.   _missing_:phoneCountryCode AND phoneNationalNumberQueryNgrams:(54321)

In fact any combination of a field term that does match, and
a phoneNationalNumberQueryNgrams term that is below the min ngram length
appears to result in a match.

At a guess, this looks like a term where the query analyzer produces no
tokens results in that term being eliminated from the overall expression
(i.e. it doesn't evaluate to FALSE in a boolean position).

The workaround is to include another non-eliminatable term in the query:
_missing_:phoneCountryCode AND (phoneNationalNumberQueryNgrams:(54321) OR
FALSE)

Is this an expected behavior, or a bug?
Is there any downside to the workaround?

Tim

Reply via email to