On 11-Mar-09, at 7:13 PM, Jenny Brown wrote:

I use the boolean logic heavily in a production app, because it's the
grammar that my users understand (and they put together complex
boolean queries in other apps too).  Also, we're not using relevance
ranking.  A document either "matches the query" and gets returned, or
"doesn't match" and doesn't get returned.  We only want yes/no
answers.

I haven't had time to really figure out what the earlier commenter
meant with the + operators syntax conversion.  I still thought it
would have meant the same thing as the query I had posted, ie, article
has to match all terms in the AND clauses, and at least one of the
terms in the OR list.  I guess I'm still missing what his explanation
was trying to demonstrate.

Anyway, just a note to say that boolean matching is important to me
and my users; it'd be good if it worked the way it looks like it
would.  If it doesn't, I need to understand better what the current
limitations are.

Well, this is precisely why I am suggesting that we remove it (in some future version of Lucene). Lucene doesn't have a hierarchical boolean query model that works like people "expect", and bugs filed that report discrepancies between the way boolean operators work and intuition are rejected. We are left with something that is convenient if you understand how it works, but if that is so, there is no reason that translation into the alternate syntax can't be used.

Lucene's query model is based on REQUIRED, OPTIONAL, and EXCLUDED clauses. A clause with no annotation is always OPTIONAL, and doesn't affect matching unless there are only OPTIONAL clauses on that level. brackets () create a subclause (note that this is OPTIONAL by default!). AND terms are translated into REQUIRED clauses, AND NOT's are translated into EXCLUDED clauses. Require clauses are annotated with +'s

A AND B OR C OR D OR E OR F
-> +A +B C D E F
-> find documents that match clause A and clause B (other clauses don't affect matching)

C OR D OR E OR F
-> C D E F
-> find documents matching at least one of these clauses

A AND (B OR C OR D OR E OR F)
-> +A +(B C D E F)
-> find documents that match A, and match one of B, C, D, E, or F

(A AND B) OR C OR D OR E OR F
-> (+A +B) C D E F
-> find documents that match at least one of C, D, E, F, or both of A and B

The key takeaway: once you have an AND in a grouped set of clauses, the OR are completely irrelevant for matching.

-Mike


Reply via email to