In Lucene, 3.4 I recently implemented "Translating PhraseQuery to
SpanNearQuery" (see Lucene in Action, page 220) because I wanted _order_ to
matter.
Here is my exact code called from getFieldsQuery once I know I'm looking at a
PhraseQuery, but I think it is exactly from the book.
static Query buildSpanNearQuery(PhraseQuery phraseQ, int slop) {
Term[] terms = phraseQ.getTerms();
SpanTermQuery[] clauses = new SpanTermQuery[terms.length];
for (int i = 0; i < terms.length; i++) {
clauses[i] = new SpanTermQuery(terms[i]);
}
SpanNearQuery query = new SpanNearQuery(clauses, slop,
PHRASE_ORDER_MATTERS);
return query;
}
I put in my own QueryParser and things looked good until I try a phrase with
stop words.
Using the old PhraseQuery I got results on a phrase with stop words without
extending the slop, but with SpanNearQuery unless the query includes some slop,
nothing is found.
This conflicts with the typical use case of a user taking a phrase, pasting
into the search bar with quotes and expecting to find his document.
I can't just add some more slop, because it depends on how many stop words are
in any sequence in the phrase.
Any suggestions on how to solve the problem of combining the idea of SpanNear
(so that words in order in a phrase is better) with text that has stop words
removed, so that I can to support the simple use of quotes for exact quoted
text matching?
Any Ideas?
-Paul