On Nov 16, 2009, at 6:43 PM, Robert Muir wrote:

> DM, in this case I'm not referring to surrogates, etc, but instead the idea 
> that properties for an existing character can change (the soft hyphen and 
> arabic ayah were two examples), also new characters are introduced.
> 
> these will affect what analysis components (ex. tokenizers) do, because they 
> like to use categories such as .isWhiteSpace, .isLetter, things like that.
> 
> this means these components have different behavior, because they are 
> data-driven, even though we didnt change any code. 

Then why not make ICU a dependency. At least then one has control of the 
delivered version. Any of us that are working with texts in non latin-1 
languages are likely to be using ICU anyway.

-- DM


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to