Niklas Nebel wrote:
Stephan Bergmann wrote:
I doubt that it is that many places that need to be changed. (For
example, what do you think needs to be done for "text import/export"?)
The obvious changes for text import:
- Separator characters are user-supplied, so they can no longer be
handled as a sal_Unicode.
Speaking about matching strings: Canonical-equivalent
(<http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf>, D24) Unicode
sequences (e.g., <U+006F LATIN SMALL LETTER O, U+0308 COMBINING
DIAERESIS> is canonical-equivalent to <U+00F6 LATIN SMALL LETTER O WITH
DIAERESIS> should be treated as being identical. So when a
user-supplied separator is specified as <U+006F,U+0308>, it should match
<U+00F6> in the imported text, and vice versa. (This is probably
handled most easily by converting all input to some Unicode
Normalization Form.) Whether you want to address this issue together
with or independent of the surrogates issue I cannot say.
- Where fixed field width is used, characters instead of code units have
to be counted.
I don't know, but "fixed to N characters" might or might not be as
useful/useless a concept as "fixed to N UTF-16 code units."
-Stephan
The less obvious ones start to appear once you look through the details
of implementations like the preview in the dialog.
Niklas
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]