Hi Everyone,
I am facing some strange behaviour with Analyzers. I am using SimpleAnalyzer
for some fields in my Compass entity, but I also wrote a custom Analyzer
that is slightly different from the SimpleAnalyzer as I wanted to allow even
letters and digits in company name column.
So custom analyzer CompanyNameAnalyzer uses the following tokenizer.
public class CompanyNameTokenizer extends CharTokenizer {
/** Construct a new CompanyNameTokenizer. */
public CompanyNameTokenizer(Reader in) {
super(in);
}
protected char normalize(char c) {
return Character.toLowerCase(c);
}
/** Collects only characters which are numbers or letters */
protected boolean isTokenChar(char c) {
Character ch = new Character(c);
// Exclude @ special character while tokenizing
if(Character.isLetterOrDigit(c) || ch.equals('@'))
return true;
else
return false;
}
}
But for some reason when I search for +companyName:good +companyName:will,
the word 'will' is ignored in the search, I get results that match only
good. I guess this means that 'will' is being stripped off as it is a stop
word. I don't understand why this should happen because the custom Analyzer
I am using does not use the StopFilter. So why should it filter the stop
words?
I tried looking at the Lucene source too, but no luck. Any suggestions would
be appreciated.
Thanks,
Harini