Well, choosing XML for such a description language has the following
drawbacks:
* hardly legible. Having one rule per line is really nice. I appreciated it
writing the french normalizer.
* it does not solve all the parsing problems.
- either you have to specify everything as elements or attributes, and
it's painful :
<leftContext><range value="aeiou"/>er<range="tr"/></leftContext>
<rightContext>er<boundary/></leftContext>
- either you have a write a parser anyway to parse the content of the
elements:
<rightContext>[aeiou]r$</leftContext>
and therefore write a parse for the content of the xml-parsed
content.
Rodrigo
----- Original Message -----
From: "Mark Tucker" <[EMAIL PROTECTED]>
To: "Lucene Developers List" <[EMAIL PROTECTED]>
Sent: Monday, March 11, 2002 10:10 PM
Subject: RE: Normalization
> Why not use XML?
>
> <normalizer>
> <rule>
> <leftContext></leftContext>
> <rightContext></rightContext>
> <transformLetters></transformLetters>
> <replacementString></replacementString>
> </rule>
> <rule>
> <leftContext></leftContext>
> <rightContext></rightContext>
> <transformLetters></transformLetters>
> <replacementString></replacementString>
> </rule>
> </normalizer>
>
>
> There are some issues with the characters you use, but using XML might
make it easier to extend.
>
> Mark
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>