Thanks, Koji, I followed your advice and change my analyzer as shown below: NormalizeCharMap RECOVERY_MAP = new NormalizeCharMap(); RECOVERY_MAP.add("c++","cplusplus$"); CharFilter filter = new LowercaseCharFilter(reader); filter = new MappingCharFilter(RECOVERY_MAP,filter); StandardTokenizer tokenStream = new StandardTokenizer(Version.LUCENE_30, filter); tokenStream.setMaxTokenLength(maxTokenLength); TokenStream result = new StandardFilter(tokenStream); result = new LowerCaseFilter(result); result = new StopFilter(enableStopPositionIncrements, result, stopSet); result = new SnowballFilter(result, STEMMER);
I use the same analyzer in the search side. As you know, this analyzer can token c++ as cplusplus, for this reason, it seems I can search c++ with the same analyzer because it is also tokenized as cplusplus. I tested it on as string c++c++, however, when i search c++ on the built index, nothing is returned. I do not know what's wrong with my code. Waiting for your replay On Fri, Dec 11, 2009 at 9:43 PM, Weiwei Wang <ww.wang...@gmail.com> wrote: > Thanks, Koji > > > On Fri, Dec 11, 2009 at 7:59 PM, Koji Sekiguchi <k...@r.email.ne.jp>wrote: > >> MappingCharFilter can be used to convert c++ to cplusplus. >> >> Koji >> >> -- >> http://www.rondhuit.com/en/ >> >> >> >> Anshum wrote: >> >>> How about getting the original token stream and then converting c++ to >>> cplusplus or anyother such transform. Or perhaps you might look at >>> using/extending(in the non java sense) some other tokenized! >>> >>> -- >>> Anshum Gupta >>> Naukri Labs! >>> http://ai-cafe.blogspot.com >>> >>> The facts expressed here belong to everybody, the opinions to me. The >>> distinction is yours to draw............ >>> >>> >>> On Fri, Dec 11, 2009 at 11:00 AM, Weiwei Wang <ww.wang...@gmail.com> >>> wrote: >>> >>> >>> >>>> Hi, all, >>>> I designed a ftp search engine based on Lucene. I did a few >>>> modifications to the StandardTokenizer. >>>> My problem is: >>>> C++ is tokenized as c from StandardTokenizer and I want to recover it >>>> from >>>> the TokenStream from StandardTokenizer >>>> >>>> What should I do? >>>> >>>> -- >>>> Weiwei Wang >>>> Alex Wang >>>> 王巍巍 >>>> Room 403, Mengmin Wei Building >>>> Computer Science Department >>>> Gulou Campus of Nanjing University >>>> Nanjing, P.R.China, 210093 >>>> >>>> Homepage: http://cs.nju.edu.cn/rl/weiweiwang >>>> >>>> >>>> >>> >>> >>> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > > -- > Weiwei Wang > Alex Wang > 王巍巍 > Room 403, Mengmin Wei Building > Computer Science Department > Gulou Campus of Nanjing University > Nanjing, P.R.China, 210093 > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang > -- Weiwei Wang Alex Wang 王巍巍 Room 403, Mengmin Wei Building Computer Science Department Gulou Campus of Nanjing University Nanjing, P.R.China, 210093 Homepage: http://cs.nju.edu.cn/rl/weiweiwang