It won't do what I need.  I may have something like:

"All-In-One is located in 92226-4446 and has an E-A-R"

I want it to be tokenized as follows:

all
one
located
92226
4446
E-A-R

Right now... it is tokenizing it as this:

all
one
located
92226-4446
E-A-R



-----Original Message-----
From: Erick Erickson [mailto:[EMAIL PROTECTED] 
Sent: Thursday, January 11, 2007 6:11 PM
To: java-user@lucene.apache.org
Subject: Re: Modifying StandardAnalyzer

Would it be simpler just to modify the input with a regex rather than
risk
messing with StandardANalyzer? Or wouldn't that do what you need?

On 1/11/07, Van Nguyen <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
>
>
> I need to modify the StandardAnalyzer so that it will tokenize zip
codes
> that look like this:
>
>
>
> 92626-2646
>
>
>
> I think the part I need to modify is in here - specifically:
>
>
>
> <HAS_DIGIT> <P> <ALPHANUM>
>
>
>
> // floating point, serial, model numbers, ip addresses, etc.
>
>   // every other segment must have at least one digit
>
> | <NUM: (<ALPHANUM> <P> <HAS_DIGIT>
>
>        | <HAS_DIGIT> <P> <ALPHANUM>
>
>        | <HAS_DIGIT> <M>
>
>        | <HAS_DIGIT> (<P> <HAS_DIGIT>)+ <M>
>
>        | <LETTER> (<P> <LETTER>)+
>
>        | <ALPHANUM> (<P> <HAS_DIGIT> <P> <ALPHANUM>)+
>
>        | <HAS_DIGIT> (<P> <ALPHANUM> <P> <HAS_DIGIT>)+
>
>        | <ALPHANUM> <P> <HAS_DIGIT> (<P> <ALPHANUM> <P> <HAS_DIGIT>)+
>
>        | <HAS_DIGIT> <P> <ALPHANUM> (<P> <HAS_DIGIT> <P> <ALPHANUM>)+
>
>         )
>
>   >
>
>
>
> Is there a way to keep that line so that the StandardAnalyzer works as
> is - but tokenize anything that looks like
>
>
>
> (HAS_DIGITS) <P>) | (<HAS_DIGITS> <P> <HAS_DIGITS>) or even better:
>
>
>
> (<DIGIT><DIGIT><DIGIT><DIGIT><DIGIT><P>) |
> <DIGIT><DIGIT><DIGIT><DIGIT><DIGIT><P><DIGIT><DIGIT><DIGIT><DIGIT>) -
I
> have zip codes that look like 92626, 92626-, and 92626-2646
>
>
>
> I've tried adding that both lines to the "SKIP" section - but to no
> avail.
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to