> Can someone please point me in the right direction.
>
> We are creating an application that needs to beable to
> search on C++ and get
> back doc's that have C++ in it. The StandardAnalyzer
> does not seem to index
> the "+", so a search for "C++" will bring back docs that
> contain, C++, C,
> C#, etc..... The WhiteSpaceAnalyzer will index the
> "+", but if we have the
> term "C++." that is, if C++ is at the end of a sentence, it
> will index
> "C++." so a search for "C++" will not return the doc.
> I have heard of maybe
> a CustomAnalyzer; however, it seems like there would
> actually need to be a
> CustomFilter/CustomTokenizer, I looked at:
> - StandardAnalyzer.java
> - StandardFilter.java
> - StandardTokenizer.java
> - StandardTokenizerImpl.java
> - StandardTokenizerImpl.jflex
>
> I would guess that the StandardTokenizer is where the
> changes would need to
> be made to allow the "+" character, but I am unclear as to
> how.
>
> Any and all help is greatly appreciated.
One option is to modify StandardTokenizerImpl.jflex and generate
CustomTokenizerImpl.java so that it will recognize C++ and C# as one token. You
need to write a new Tokenizer that uses that CustomTokenizerImpl.java.
Other option can be to extend CharTokenizer. Modify the source code of
LetterTokenizer :
@Override
protected boolean isTokenChar(char c) {
return Character.isLetter(c) || c=='+' || c=='#';
}
Hope this helps.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]