hi all,

I would like to implement the possibility to search for "C++" and "C#" -
I found in the archive the hint to customize the appropriate *.jj  file
with the code in NutchAnalysis.jj:

     // irregular words
| <#IRREGULAR_WORD: (<C_PLUS_PLUS>|<C_SHARP>)>
| <#C_PLUS_PLUS: ("C"|"c") "++" >
| <#C_SHARP: ("C"|"c") "#" >

I am using a custum analyzer with the yonik's WordDelimiterFilter:

@Override
        public TokenStream tokenStream(String fieldName, Reader reader) {
                                
                return new LowerCaseFilter(new WordDelimiterFilter(new
WhitespaceTokenizer(reader),1,1,1,1,1 ));
        }


But as I can see WordDelimiterFilter uses only the WhiteSpaceTokenizer
which does not use a Java-CC file.

What would be the best way to integrate (anyway, preferably not changing
lucene-src) this feature?

Should I override the WhitespaceTokenizer and using java-cc ( are there
any docs on doing this?).

tia,
martin





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to