Michael Schnell schrieb:
AFAIK, the decision to use UTF8 is due to Linux using this encoding and so no conversion is done in the LCL system API.
IMO more important: no new string and char type (Wide...) is required, no duplicate set of stringhandling procedures. This may be essential for databases and communication as well.
This of course is bad with Windows, as here the API uses UTF16 and everything needs to be recoded in the LC System API on entry and exit.
The overhead may be neglectable in direct API calls, when these do real work. Strings in (visual) components can be converted once, into the internally used (OS display conforming) representation, and again the conversion overhead can be low until undetectable in the GUI.
Supposedly doing different string types - UTF8String vs (a reference counting version of UTF-16-encoded) WideString - for Linux and Windows at the LCL-user-Code interface is too confusing.
A *portable* UTF string implementation should be restricted, eliminating direct and indexed access to chars (which become substrings). A dedicated UTF16 class/type can be added at any time, as an optional package.
OTOH I agree that the weak (non-existing) distinction between Ansi and UTF8 strings is not pleasing. But here I'd establish a strong boundary between general (Unicode=UTF8) strings, and application specific strings of a single (immutable) codepage - remember that "Ansi" is not a single specific encoding, instead it's a collection of single-byte-char encodings, including UTF-8. Then the user can choose a specific codepage (or UTF-16) for use inside his application, with e.g. an AppString type. Then it's clear where conversions are required and have to be inserted automatically by the compiler.
The Delphi model, with differently encoded strings in the same string type, can result in much uncontrollable conversion overhead, easily outweighting the few possible optimizations with current AnsiStrings (assuming SBCS[1] only). The new ABI also is incompatible with existing DLLs of earlier Delphi/BCB versions, causing trouble with third-party components that are not available in the new ABI. Okay, no such problems exist with open source components, but not all Lazarus add-ons or apps are necessarily open source.
[1] With MBCS charsets the same rules apply as to UTF-8, so that UTF-8 can immediately replace all MBCS encodings. So the decision about new string types *only* affects current SBCS Ansi users, even ASCII users are not affected.
DoDi -- _______________________________________________ Lazarus mailing list [email protected] http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
