Niklas Nebel wrote:
Stephan Bergmann wrote:
I doubt that it is that many places that need to be changed. (For example, what do you think needs to be done for "text import/export"?)

The obvious changes for text import:
- Separator characters are user-supplied, so they can no longer be handled as a sal_Unicode.

Speaking about matching strings: Canonical-equivalent (<http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf>, D24) Unicode sequences (e.g., <U+006F LATIN SMALL LETTER O, U+0308 COMBINING DIAERESIS> is canonical-equivalent to <U+00F6 LATIN SMALL LETTER O WITH DIAERESIS> should be treated as being identical. So when a user-supplied separator is specified as <U+006F,U+0308>, it should match <U+00F6> in the imported text, and vice versa. (This is probably handled most easily by converting all input to some Unicode Normalization Form.) Whether you want to address this issue together with or independent of the surrogates issue I cannot say.

- Where fixed field width is used, characters instead of code units have to be counted.

I don't know, but "fixed to N characters" might or might not be as useful/useless a concept as "fixed to N UTF-16 code units."

-Stephan

The less obvious ones start to appear once you look through the details of implementations like the preview in the dialog.

Niklas

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to