> One of these documents has the line "access, the
> manager". When searching for the phrase "access manager", this document is
> being returned. I understand why (at least i think i do), because a stop
> word is "the" and the "," is being removed by the tokenizer, my question is
> is there any way I can avoid having this returned in the results?
I don't think you can't without reindexing the documents and changing
QueryParser a bit. The reasons is although if you introduce your new
tokenizer/analyzer the original documents have been indexed with those stop
words removed.
You have to create an analyzer that doesn't drop your stop words and start the
reindexing again.
However you must be careful when using your custom analyser to do the query
parsing, because sometime you may want to drop the stop words in a non-quoted
query, so
hello and world ---> +hello +world
but
"hello and world" --> +"hello and world"
One solution that I can think of is by passing two analysers in QueryParser,
one is for the "standard" analyser and the other is for the "phrase query"
analyser. Down in the QueryParser.jj around this area do something like this:
| term=<QUOTED>
[ slop=<SLOP> ]
[ <CARAT> boost=<NUMBER> ]
{
if (phraseAnalyzer == null) {
// use phrase query custom analyser that doesn't drop stop words
} else {
// otherwise use normal analyzer
}
This may work as a matter of fact I think it should.
HTH
victor
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]