Hi Rodrigo, A couple of things that I should have warned you about in our discussion yesterday.
The rules seem to be applied sequentially and each rule modifies the output of the previous one. This is kind of risky especially if the rule set becomes too big. The author of the rules needs to keep this present at all times. For example, there is a rule for "ons$" and a following one for "ions$". The second one will never be matched because the string will be changed by the first rule it matches. Even though aimons and aimions should be reduced to "em" they end up into "em" and "emi". Maybe this could be solved if you do longest match first. The other consequence of the sequentiality is the possible change of context. Some rules could never be reached therefore. Don't remember how we got around this. Alex -----Original Message----- From: Rodrigo Reyes [mailto:[EMAIL PROTECTED]] Sent: Tuesday, March 12, 2002 3:18 PM To: Lucene Developers List Subject: Re: Normalization > Anyway, I'll try to add a few comments in the sourcecode (although > it's very > small, like 8 small classes) and package it so that the lucene > developers can try it. Should be ready tomorrow. Ok, please find enclosed hereby the archive of the normalizer. To compile it, juste type "ant". To test the french normalizer just run "ant test-french", or "ant test-soundex" for the soundex. Rodrigo -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
