I am not sure if this is a problem with Lucene or if I am building my
Query object improperly. It seems to me, when performing a search that
should exclude certain terms, MultiFieldQueryParser doesn't filter out
documents when it should. Consider the following example to clarify
what I am talking about.
Say the index contains a document with two fields: title and
description. The value stored in the title field is "chocolate shoes"
and the value in description is "hazardous candy".
If I pass in the following query "+chocolate -hazardous", then the
document mentioned above IS returned as a result. I did a little
investigation and noticed that only if the term "hazardous" exists in
every single field that is a part of MultiFieldQueryParser, will the
document be filtered out of the search results.
Other queries such as "+chocolate" or "+chocolate +hazardous" seem to
work fine.
One note: I did notice the following text in the FAQ section of the
lucene website: "Also MultiFieldQueryParser builds queries that
sometimes behave unexpectedly, namely for AND queries: it requires alls
terms to appear in all field. This is not what one typically wants, for
example in a search over "title" and "body" fields (Lucene 1.9 fixes
this problem)." - it seems there has been some problems noticed in the
past, perhaps they didn't fix all use cases in regards to this.
I am currently using lucene 2.0.0.
Here is the code I am using to build the Query object:
BooleanQuery q = new BooleanQuery();
String[] fields = new String[]{ "name", "description" };
Query keywordQuery = MultiFieldQueryParser.parse(keywords, fields,
new
BooleanClause.Occur[]{BooleanClause.Occur.SHOULD,
BooleanClause.Occur.SHOULD}
new StandardAnalyzer());
q.add(keywordQuery, BooleanClause.Occur.MUST); //true, false);
Any help or suggestions is appreciated,
Scott Sellman