Re: [Lazarus] Unicode on Windows

Hans-Peter Diettrich Mon, 09 Apr 2012 15:59:15 -0700

Marcos Douglas schrieb:

I still think about:
DirectoryExists or DirectoryExistsUTF8
ForceDirectoriesUTF8 or ForceDirectories
Pos or UTF8Pos
etc


Depends what part of code you are...

Such problems may (should) go away with the new Unicode- and AnsiStringtypes, where AnsiString contains an Encoding field. Then the conversionbetween UTF-8 and the system codepage are done automatically, wheneverrequired, and the xyUTF8 functions can be dropped then.

I discourage the use of UTF8Pos, in detail together with the new(encoded) AnsiString type. When such a string is auto-converted, forsome reason, the index returned by UTF8Pos will become invalid. This isone of the downsides of encoded strings, which suggest to useUnicodeString in future code. Delphi enforced that move, by changingString and Char to UnicodeString and WideChar, and Delphi compatibilitypropagated that pressure into FPC. The continued use of UTF-8 strings(AnsiString) will result in a speed and memory usage penalty, unless thesystem codepage is UTF-8. If your code only contains String typestrings, not AnsiString or UTF8String, then all your strings will becomeUnicodeStrings (UTF-16), for which the xyUTF8 functions are eitherinapplicable or will result only in superfluous implicit string conversions.

Now every user has the choice to stay with a specific FPC/Lazarusversion, that does not yet support the new string types, or to dropUTF-8 strings in favor of the new UTF-16 strings. Since most code has todeal with the Unicode BMP (BasicMappingPage) only, the differencebetween the length of an string in (UTF-8)chars and characters has goneaway with UTF-16. Do you really see a need for finding the position of anon-BMP character in an string, and changing exactly that character inthe string? Then you are on the safe side by using StringReplace, whichalready worked with UTF-8 and will continue to work with UTF-16 andwhatever other encoding. The use of Char variables has been dangerousalready with UTF-8, where exotic ("astral") characters can consist of upto 6 bytes. In so far I don't understand why Delphi now uses WideCharfor Char, instead of UnicodeChar, where it is guaranteed that everycodepoint (except ligatures and similar text-processing stuff) can bestored in a UnicodeChar variable.


DoDi


--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Re: [Lazarus] Unicode on Windows

Reply via email to