Re: StandardAnalyzer exclude numbers

Mark Miller Mon, 22 Sep 2008 05:51:14 -0700

Agreed. I am always diving into that analyzer too fast <g> Possibly
premature optimization thoughts as well. But scanning the token after in
a filter and breaking/skipping if you find a number will be much easier
and possibly not too much slower. Depends on how involved you are/want
to get I suppose. Personally I would prefer to start a new analyzer for
such a significant change, but for the average Lucene user, pre/post
processing is always going to make more sense. Plus there is enough
overlap in the code that I can see plenty of people preferring not to
split off.


黄成 wrote:
> why not use a token filter?
>
> On Mon, Sep 22, 2008 at 8:36 PM, Mark Miller <[EMAIL PROTECTED]> wrote:
>
>   
>> [EMAIL PROTECTED] wrote:
>>
>>     
>>> Hello
>>>
>>> Is it possible to exclude numbers using StandardAnalyzer just like
>>> SimpleAnalyzer?
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>  Its possible but its tricky. You would want to copy the StandardAnalyzer
>>>       
>> into your own Analyzer and then modify the grammar.
>> StandardTokenizerImpl.jflex is where to look, but you will have to learn how
>> to use/compile jflex (look at the build file) to build the parser classes.
>> What you would do though, is start by trying to remove the digit from the
>> Alphanum regex in StandardTokenizerImpl.jflex. You might want to rename
>> alphanum after such a move. That may be as far as you need to go.
>>
>>
>> - Mark
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>>     
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: StandardAnalyzer exclude numbers

Reply via email to