hi all,
I would like to implement the possibility to search for "C++" and "C#" -
I found in the archive the hint to customize the appropriate *.jj file
with the code in NutchAnalysis.jj:
// irregular words
| <#IRREGULAR_WORD: (<C_PLUS_PLUS>|<C_SHARP>)>
| <#C_PLUS_PLUS: ("C"|"c") "++" >
| <#C_SHARP: ("C"|"c") "#" >
I am using a custum analyzer with the yonik's WordDelimiterFilter:
@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
return new LowerCaseFilter(new WordDelimiterFilter(new
WhitespaceTokenizer(reader),1,1,1,1,1 ));
}
But as I can see WordDelimiterFilter uses only the WhiteSpaceTokenizer
which does not use a Java-CC file.
What would be the best way to integrate (anyway, preferably not changing
lucene-src) this feature?
Should I override the WhitespaceTokenizer and using java-cc ( are there
any docs on doing this?).
tia,
martin
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]