Re: [Lazarus] Unicode on Windows

Marcos Douglas Mon, 09 Apr 2012 18:50:15 -0700

On Mon, Apr 9, 2012 at 8:17 PM, Hans-Peter Diettrich
<[email protected]> wrote:
> Marcos Douglas schrieb:
>
>
>> I still think about:
>> DirectoryExists or DirectoryExistsUTF8
>> ForceDirectoriesUTF8 or ForceDirectories
>> Pos or UTF8Pos
>> etc
>>
>> Depends what part of code you are...
>
>
> Such problems may (should) go away with the new Unicode- and AnsiString
> types, where AnsiString contains an Encoding field. Then the conversion
> between UTF-8 and the system codepage are done automatically, whenever
> required, and the xyUTF8 functions can be dropped then.
>
> I discourage the use of UTF8Pos, in detail together with the new (encoded)
> AnsiString type. When such a string is auto-converted, for some reason, the
> index returned by UTF8Pos will become invalid. This is one of the downsides
> of encoded strings, which suggest to use UnicodeString in future code.
> Delphi enforced that move, by changing String and Char to UnicodeString and
> WideChar, and Delphi compatibility propagated that pressure into FPC. The
> continued use of UTF-8 strings (AnsiString) will result in a speed and
> memory usage penalty, unless the system codepage is UTF-8. If your code only
> contains String type strings, not AnsiString or UTF8String, then all your
> strings will become UnicodeStrings (UTF-16), for which the xyUTF8 functions
> are either inapplicable or will result only in superfluous implicit string
> conversions.
>
> Now every user has the choice to stay with a specific FPC/Lazarus version,
> that does not yet support the new string types, or to drop UTF-8 strings in
> favor of the new UTF-16 strings. Since most code has to deal with the
> Unicode BMP (BasicMappingPage) only, the difference between the length of an
> string in (UTF-8)chars and characters has gone away with UTF-16. Do you
> really see a need for finding the position of a non-BMP character in an
> string, and changing exactly that character in the string? Then you are on
> the safe side by using StringReplace, which already worked with UTF-8 and
> will continue to work with UTF-16 and whatever other encoding. The use of
> Char variables has been dangerous already with UTF-8, where exotic
> ("astral") characters can consist of up to 6 bytes. In so far I don't
> understand why Delphi now uses WideChar for Char, instead of UnicodeChar,
> where it is guaranteed that every codepoint (except ligatures and similar
> text-processing stuff) can be stored in a UnicodeChar variable.


When the new Unicode and AnsiString types (that contains an Encoding
field) arrive to us, users of FPC 2.6.1? Is this done?

Marcos Douglas

--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Re: [Lazarus] Unicode on Windows

Reply via email to