Hello Lazarus-List, Friday, December 3, 2010, 7:27:51 PM, you wrote:
GF> The upper/lower tables in Unicodemappings are pure Unicode GF> (please take a look). They are in NO way country dependent. What GF> make you think " tables seems to be the country agnostic ones" In some languages some unicode codepoints have different uppercase/lowercase pair. In example "i" in english (and most others) region is uppercased to "I" while in Turkish it is "I"+Upperdot (i can not write it here). Take a look over: "Why Applications Fail With The Turkish Language" at http://www.i18nguy.com/unicode/turkish-i18n.html GF> As I said, I have taken this from LCLProc. GF> The Unicode > UTF8 should be ok... It does not generate a GF> codepoint for a character outside the Unicoderange. Well, in fact yes, try converting Unicode $FFFF to UTF8 (not tested, but 99% of plain implementations just overlook exceptions). GF> For UTF8 > Unicode, well this is a desiign question. There is GF> a function that generates an exception on a wrong sequence. What GF> else would you do? There are 2 problems, invalid sequences and malformed sequences, both are different beasts. Malformed sequences: They are unterminated UTF8 sequences. So they starts as an UTF8 sequece, but they ends abruptally with a non continuing mark. In example #128#1 Invalid sequences: This are writting most times intentionally to bypass protection systems, raise buffer overflows and other "funny" things. They exploit the ability of UTF8 to using one sequence obtain a totally different character. In example #$C0#$80 which will output NULL char which is in fact dangerous when playing with C-Style strings. Recomendation is to replace the invalid/malformed codepoint by "?" or better by the unicode error question mark '?' (U+FFFD), or raise an exception, but never eat it. In order to test you can use the stress test: http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt Take a look over UTF8ToUnicode in fpc sources (there is one border case that it is still wrong). -- Best regards, José -- _______________________________________________ Lazarus mailing list [email protected] http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
