Problem solved. Now another problem comes.

As I want to use Highlighter in my system, the token offset is incorrect
after the MappingCharFilter is used.

Koji, do you known how to fix the offset problem?

On Sun, Dec 13, 2009 at 11:12 AM, Weiwei Wang <[email protected]> wrote:

> I use Luke to check the result and find only c exists as a term, no
> cplusplus found in the index
>
>
> On Sun, Dec 13, 2009 at 10:34 AM, Weiwei Wang <[email protected]>wrote:
>
>> Thanks, Koji, I followed your advice and change my analyzer as shown
>> below:
>> NormalizeCharMap RECOVERY_MAP = new NormalizeCharMap();
>> RECOVERY_MAP.add("c++","cplusplus$");
>> CharFilter filter = new LowercaseCharFilter(reader);
>> filter = new MappingCharFilter(RECOVERY_MAP,filter);
>> StandardTokenizer tokenStream = new StandardTokenizer(Version.LUCENE_30,
>> filter);
>> tokenStream.setMaxTokenLength(maxTokenLength);
>> TokenStream result = new StandardFilter(tokenStream);
>> result = new LowerCaseFilter(result);
>> result = new StopFilter(enableStopPositionIncrements, result, stopSet);
>> result = new SnowballFilter(result, STEMMER);
>>
>> I use the same analyzer in the search side. As you know, this analyzer can
>> token c++ as cplusplus, for this reason, it seems I can search c++ with
>> the same analyzer because it is also tokenized as cplusplus.
>>
>> I tested it on as string c++c++, however, when i search c++ on the built
>> index, nothing is returned.
>>
>>  I do not know what's wrong with my code. Waiting for your replay
>>
>>
>>
>>
>>
>> On Fri, Dec 11, 2009 at 9:43 PM, Weiwei Wang <[email protected]>wrote:
>>
>>> Thanks, Koji
>>>
>>>
>>> On Fri, Dec 11, 2009 at 7:59 PM, Koji Sekiguchi <[email protected]>wrote:
>>>
>>>> MappingCharFilter can be used to convert c++ to cplusplus.
>>>>
>>>> Koji
>>>>
>>>> --
>>>> http://www.rondhuit.com/en/
>>>>
>>>>
>>>>
>>>> Anshum wrote:
>>>>
>>>>> How about getting the original token stream and then converting c++ to
>>>>> cplusplus or anyother such transform. Or perhaps you might look at
>>>>> using/extending(in the non java sense) some other tokenized!
>>>>>
>>>>> --
>>>>> Anshum Gupta
>>>>> Naukri Labs!
>>>>> http://ai-cafe.blogspot.com
>>>>>
>>>>> The facts expressed here belong to everybody, the opinions to me. The
>>>>> distinction is yours to draw............
>>>>>
>>>>>
>>>>> On Fri, Dec 11, 2009 at 11:00 AM, Weiwei Wang <[email protected]>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hi, all,
>>>>>>    I designed a ftp search engine based on Lucene. I did a few
>>>>>> modifications to the StandardTokenizer.
>>>>>> My problem is:
>>>>>>  C++ is tokenized as c from StandardTokenizer and I want to recover it
>>>>>> from
>>>>>> the TokenStream from StandardTokenizer
>>>>>>
>>>>>> What should I do?
>>>>>>
>>>>>> --
>>>>>> Weiwei Wang
>>>>>> Alex Wang
>>>>>> 王巍巍
>>>>>> Room 403, Mengmin Wei Building
>>>>>> Computer Science Department
>>>>>> Gulou Campus of Nanjing University
>>>>>> Nanjing, P.R.China, 210093
>>>>>>
>>>>>> Homepage: http://cs.nju.edu.cn/rl/weiweiwang
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>
>>>
>>> --
>>> Weiwei Wang
>>> Alex Wang
>>> 王巍巍
>>> Room 403, Mengmin Wei Building
>>> Computer Science Department
>>> Gulou Campus of Nanjing University
>>> Nanjing, P.R.China, 210093
>>>
>>> Homepage: http://cs.nju.edu.cn/rl/weiweiwang
>>>
>>
>>
>>
>> --
>> Weiwei Wang
>> Alex Wang
>> 王巍巍
>> Room 403, Mengmin Wei Building
>> Computer Science Department
>> Gulou Campus of Nanjing University
>> Nanjing, P.R.China, 210093
>>
>> Homepage: http://cs.nju.edu.cn/rl/weiweiwang
>>
>
>
>
> --
> Weiwei Wang
> Alex Wang
> 王巍巍
> Room 403, Mengmin Wei Building
> Computer Science Department
> Gulou Campus of Nanjing University
> Nanjing, P.R.China, 210093
>
> Homepage: http://cs.nju.edu.cn/rl/weiweiwang
>



-- 
Weiwei Wang
Alex Wang
王巍巍
Room 403, Mengmin Wei Building
Computer Science Department
Gulou Campus of Nanjing University
Nanjing, P.R.China, 210093

Homepage: http://cs.nju.edu.cn/rl/weiweiwang

Reply via email to