Agreed. I am always diving into that analyzer too fast <g> Possibly premature optimization thoughts as well. But scanning the token after in a filter and breaking/skipping if you find a number will be much easier and possibly not too much slower. Depends on how involved you are/want to get I suppose. Personally I would prefer to start a new analyzer for such a significant change, but for the average Lucene user, pre/post processing is always going to make more sense. Plus there is enough overlap in the code that I can see plenty of people preferring not to split off.
黄成 wrote: > why not use a token filter? > > On Mon, Sep 22, 2008 at 8:36 PM, Mark Miller <[EMAIL PROTECTED]> wrote: > > >> [EMAIL PROTECTED] wrote: >> >> >>> Hello >>> >>> Is it possible to exclude numbers using StandardAnalyzer just like >>> SimpleAnalyzer? >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> Its possible but its tricky. You would want to copy the StandardAnalyzer >>> >> into your own Analyzer and then modify the grammar. >> StandardTokenizerImpl.jflex is where to look, but you will have to learn how >> to use/compile jflex (look at the build file) to build the parser classes. >> What you would do though, is start by trying to remove the digit from the >> Alphanum regex in StandardTokenizerImpl.jflex. You might want to rename >> alphanum after such a move. That may be as far as you need to go. >> >> >> - Mark >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]