Hi László,
I found out why ZWSP does not work as a word-boundary in ICU. They
decided that it was not a spacing character, but a format character, and
not a word boundary. I have sunmitted a patch to the ICU for OOo in
which we revert it to be a spacing character.
We are developing a dictionary-based breakiterator as the one for Thai
in ICU. The question is: does OpenOffice have special code for
tokenization in Thai? or is the ICU breakiterator enough.
I want to know if besides writing the ICU breakiterator, I also have to
do something in OOo.
Thanks,
Javier
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]