MappingCharFilter preserves the offsets in the stream *before* filtering. So if you store the original string (without c++ replaced) in a stored field you can highlight using the given offstes. The highlighter must use again the same analyzer or use FastVectorHighlighter.
----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Weiwei Wang [mailto:ww.wang...@gmail.com] > Sent: Sunday, December 13, 2009 11:43 AM > To: java-user@lucene.apache.org > Subject: Re: Recover special terms from StandardTokenizer > > Problem solved. Now another problem comes. > > > As I want to use Highlighter in my system, the token offset is incorrect > after the MappingCharFilter is used. > > Koji, do you known how to fix the offset problem? > > On Sun, Dec 13, 2009 at 11:12 AM, Weiwei Wang <ww.wang...@gmail.com> > wrote: > > > I use Luke to check the result and find only c exists as a term, no > > cplusplus found in the index > > > > > > On Sun, Dec 13, 2009 at 10:34 AM, Weiwei Wang > <ww.wang...@gmail.com>wrote: > > > >> Thanks, Koji, I followed your advice and change my analyzer as shown > >> below: > >> NormalizeCharMap RECOVERY_MAP = new NormalizeCharMap(); > >> RECOVERY_MAP.add("c++","cplusplus$"); > >> CharFilter filter = new LowercaseCharFilter(reader); > >> filter = new MappingCharFilter(RECOVERY_MAP,filter); > >> StandardTokenizer tokenStream = new > StandardTokenizer(Version.LUCENE_30, > >> filter); > >> tokenStream.setMaxTokenLength(maxTokenLength); > >> TokenStream result = new StandardFilter(tokenStream); > >> result = new LowerCaseFilter(result); > >> result = new StopFilter(enableStopPositionIncrements, result, stopSet); > >> result = new SnowballFilter(result, STEMMER); > >> > >> I use the same analyzer in the search side. As you know, this analyzer > can > >> token c++ as cplusplus, for this reason, it seems I can search c++ with > >> the same analyzer because it is also tokenized as cplusplus. > >> > >> I tested it on as string c++c++, however, when i search c++ on the > built > >> index, nothing is returned. > >> > >> I do not know what's wrong with my code. Waiting for your replay > >> > >> > >> > >> > >> > >> On Fri, Dec 11, 2009 at 9:43 PM, Weiwei Wang > <ww.wang...@gmail.com>wrote: > >> > >>> Thanks, Koji > >>> > >>> > >>> On Fri, Dec 11, 2009 at 7:59 PM, Koji Sekiguchi > <k...@r.email.ne.jp>wrote: > >>> > >>>> MappingCharFilter can be used to convert c++ to cplusplus. > >>>> > >>>> Koji > >>>> > >>>> -- > >>>> http://www.rondhuit.com/en/ > >>>> > >>>> > >>>> > >>>> Anshum wrote: > >>>> > >>>>> How about getting the original token stream and then converting c++ > to > >>>>> cplusplus or anyother such transform. Or perhaps you might look at > >>>>> using/extending(in the non java sense) some other tokenized! > >>>>> > >>>>> -- > >>>>> Anshum Gupta > >>>>> Naukri Labs! > >>>>> http://ai-cafe.blogspot.com > >>>>> > >>>>> The facts expressed here belong to everybody, the opinions to me. > The > >>>>> distinction is yours to draw............ > >>>>> > >>>>> > >>>>> On Fri, Dec 11, 2009 at 11:00 AM, Weiwei Wang <ww.wang...@gmail.com> > >>>>> wrote: > >>>>> > >>>>> > >>>>> > >>>>>> Hi, all, > >>>>>> I designed a ftp search engine based on Lucene. I did a few > >>>>>> modifications to the StandardTokenizer. > >>>>>> My problem is: > >>>>>> C++ is tokenized as c from StandardTokenizer and I want to recover > it > >>>>>> from > >>>>>> the TokenStream from StandardTokenizer > >>>>>> > >>>>>> What should I do? > >>>>>> > >>>>>> -- > >>>>>> Weiwei Wang > >>>>>> Alex Wang > >>>>>> 王巍巍 > >>>>>> Room 403, Mengmin Wei Building > >>>>>> Computer Science Department > >>>>>> Gulou Campus of Nanjing University > >>>>>> Nanjing, P.R.China, 210093 > >>>>>> > >>>>>> Homepage: http://cs.nju.edu.cn/rl/weiweiwang > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>>> > >>>> > >>> > >>> > >>> -- > >>> Weiwei Wang > >>> Alex Wang > >>> 王巍巍 > >>> Room 403, Mengmin Wei Building > >>> Computer Science Department > >>> Gulou Campus of Nanjing University > >>> Nanjing, P.R.China, 210093 > >>> > >>> Homepage: http://cs.nju.edu.cn/rl/weiweiwang > >>> > >> > >> > >> > >> -- > >> Weiwei Wang > >> Alex Wang > >> 王巍巍 > >> Room 403, Mengmin Wei Building > >> Computer Science Department > >> Gulou Campus of Nanjing University > >> Nanjing, P.R.China, 210093 > >> > >> Homepage: http://cs.nju.edu.cn/rl/weiweiwang > >> > > > > > > > > -- > > Weiwei Wang > > Alex Wang > > 王巍巍 > > Room 403, Mengmin Wei Building > > Computer Science Department > > Gulou Campus of Nanjing University > > Nanjing, P.R.China, 210093 > > > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang > > > > > > -- > Weiwei Wang > Alex Wang > 王巍巍 > Room 403, Mengmin Wei Building > Computer Science Department > Gulou Campus of Nanjing University > Nanjing, P.R.China, 210093 > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org