to be detailed, I implemented a ftp search engine for campus students. I
have handle many different words including chinese words, as a result I
can't only use whitespaceanalyzer. My analyzer is now like this:
StandardTokenizer tokenStream = new StandardTokenizer(reader,
replaceInvalidAcronym);
tokenStream.setMaxTokenLength(maxTokenLength);
TokenStream result = new StandardFilter(tokenStream);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopSet);
result = new SnowballFilter(result,STEMMER);
StandardTokenizer is modified by me to split words like season09(like search
for friends season 09) to “season" and "09"。
word "c++" is analyzed as "c".
I know i can modify the standardtokenizer to achieve my goal. But are there
any other neat methods?
2009/4/9 hyj <[email protected]>
> 王巍巍,您好!
>
> WhitespaceAnalyzer can work.
>
> ======= 2009-04-09 15:15:14 您在来信中写道:=======
>
> >I want to make my lucene can search word like c++, c#, how can i modify
> my
> >analyzer to achieve this goal?
> >
> >--
> >王巍巍(Weiwei Wang)
> >Department of Computer Science
> >Gulou Campus of Nanjing University
> >Nanjing, P.R.China, 210093
> >
> >Mobile: 86-13913310569
> >MSN: [email protected]
> >Homepage: http://cs.nju.edu.cn/rl/weiweiwang
>
> = = = = = = = = = = = = = = = = = = = =
>
>
> 致
> 礼!
>
>
> hyj
> [email protected]
> 2009-04-09
>
>
--
王巍巍(Weiwei Wang)
Department of Computer Science
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093
Mobile: 86-13913310569
MSN: [email protected]
Homepage: http://cs.nju.edu.cn/rl/weiweiwang