Re: [dev] Unicode---Give us all of it!

Stephan Bergmann Mon, 13 Nov 2006 06:00:33 -0800

Niklas Nebel wrote:

Stephan Bergmann wrote:
I doubt that it is that many places that need to be changed. (Forexample, what do you think needs to be done for "text import/export"?)
The obvious changes for text import:
- Separator characters are user-supplied, so they can no longer behandled as a sal_Unicode.

Speaking about matching strings: Canonical-equivalent(<http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf>, D24) Unicodesequences (e.g., <U+006F LATIN SMALL LETTER O, U+0308 COMBININGDIAERESIS> is canonical-equivalent to <U+00F6 LATIN SMALL LETTER O WITHDIAERESIS> should be treated as being identical. So when auser-supplied separator is specified as <U+006F,U+0308>, it should match<U+00F6> in the imported text, and vice versa. (This is probablyhandled most easily by converting all input to some UnicodeNormalization Form.) Whether you want to address this issue togetherwith or independent of the surrogates issue I cannot say.

- Where fixed field width is used, characters instead of code units haveto be counted.

I don't know, but "fixed to N characters" might or might not be asuseful/useless a concept as "fixed to N UTF-16 code units."


-Stephan

The less obvious ones start to appear once you look through the detailsof implementations like the preview in the dialog.
Niklas


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [dev] Unicode---Give us all of it!

Reply via email to