Graeme Geldenhuys schrieb:
> Op 2011-02-14 20:03, Jürgen Hestermann het geskryf:
>> Do you mean that the compiler should convert the strings as needed in
>> the background (as between different integer types and/or floats) so
>> that you can call ListBox1.Items.Add(x) with x beeing UTF8string or
>> UTF16string or...?
> Yes, but in reality how often would such conversions happen? TStringList
> (used inside a TListBox) would use UnicodeString types. The encoding of
> that type would default to whatever platform you compiled on. ie: under
> Linux it would default to UTF-8, and under Windows it would default to
> UTF-16


That's sounds like yet another approach. So up to now I see 3 models how strings could be handled:

--------------
1.) Full programmer responsibillity (current model):
The programmer is fully responsible for (and has full control about) the strings used in his program. Libraries use mostly UTF8 with some exception like API related libraries. The programmer needs to know about the used string types in all used libraries and if conversions are needed he has to initiate them manually.

Pros:
The programmer knows exactly what happens under the hood so he can judge performance and incompatibilities (at least he should).

Cons:
Much harder to code because he *needs* to know about all the details of string encodings in different libraries. When strings are saved to files they would be compatible accross OS platforms because the programmer can use the same type in all cases so files can be exchanged accross them.

--------------
2.) A generic "UnicodeString" is mapped to different real sting types "under the hood". So the used string type in programs (and libraries like LCL) differs from platform to platform. The programmer does not even know what type is used. If a conversion is still needed for special routines it would be done automatically in the background without the programmer having to know about it. Other real string types like UTF8string are available but it's not encouraged to use them.

Pros:
Easy to code. In general, deeper knowledge about string encodings and their storage is not needed. String conversions are seldom needed.

Cons:
When non-unicode strings are used on a platform (i.e. ANSI on Windows) but unicode is required by the program it becomes clumsy because then the programmer has to use it's own (unicode) string type and then conversion are needed for all library and other functions. When strings are saved to files they may differ on different platforms so files cannot be exchanged accross them. All libraries have to be rewritten to handle different string types.


-------------
3.) A middle course: UTF8 is chosen to be the main string type which should be used whenever possible (within LCL and other libraries) and also programmers are encouraged to use it so that conversions become (more and more) unlikely. When using interfaces with different string trypes (like OS APIs) there would be an automatic conversion in the background.

Pros:
Easy to code. No doubt about the used string type and its capabilities for the programmer, it's always UTF8 for him. When strings are saved to disk they are all UTF8 on all platforms so files can be exchanged between Linux and Windows (and others).

Cons:
Because LCL and other libraries use UTF8 there could be a performace impact when compiling to non-UTF8 OS (where API's use ANSI or other UTF16 or whatever).



I would prefer model 3.)


--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Reply via email to