Re: [Lazarus] Does Lazarus support a complete Unicode Component Library?

Jürgen Hestermann Tue, 15 Feb 2011 23:12:45 -0800

Graeme Geldenhuys schrieb:
> Op 2011-02-14 20:03, Jürgen Hestermann het geskryf:
>> Do you mean that the compiler should convert the strings as needed in
>> the background (as between different integer types and/or floats) so
>> that you can call ListBox1.Items.Add(x) with x beeing UTF8string or
>> UTF16string or...?
> Yes, but in reality how often would such conversions happen? TStringList
> (used inside a TListBox) would use UnicodeString types. The encoding of
> that type would default to whatever platform you compiled on. ie: under
> Linux it would default to UTF-8, and under Windows it would default to
> UTF-16

That's sounds like yet another approach. So up to now I see 3 models howstrings could be handled:


--------------
1.) Full programmer responsibillity (current model):

The programmer is fully responsible for (and has full control about) thestrings used in his program. Libraries use mostly UTF8 with someexception like API related libraries. The programmer needs to know aboutthe used string types in all used libraries and if conversions areneeded he has to initiate them manually.


Pros:

The programmer knows exactly what happens under the hood so he can judgeperformance and incompatibilities (at least he should).


Cons:

Much harder to code because he *needs* to know about all the details ofstring encodings in different libraries. When strings are saved to filesthey would be compatible accross OS platforms because the programmer canuse the same type in all cases so files can be exchanged accross them.


--------------

2.) A generic "UnicodeString" is mapped to different real sting types"under the hood". So the used string type in programs (and librarieslike LCL) differs from platform to platform. The programmer does noteven know what type is used. If a conversion is still needed for specialroutines it would be done automatically in the background without theprogrammer having to know about it. Other real string types likeUTF8string are available but it's not encouraged to use them.


Pros:

Easy to code. In general, deeper knowledge about string encodings andtheir storage is not needed. String conversions are seldom needed.


Cons:

When non-unicode strings are used on a platform (i.e. ANSI on Windows)but unicode is required by the program it becomes clumsy because thenthe programmer has to use it's own (unicode) string type and thenconversion are needed for all library and other functions. When stringsare saved to files they may differ on different platforms so filescannot be exchanged accross them. All libraries have to be rewritten tohandle different string types.



-------------

3.) A middle course: UTF8 is chosen to be the main string type whichshould be used whenever possible (within LCL and other libraries) andalso programmers are encouraged to use it so that conversions become(more and more) unlikely. When using interfaces with different stringtrypes (like OS APIs) there would be an automatic conversion in thebackground.


Pros:

Easy to code. No doubt about the used string type and its capabilitiesfor the programmer, it's always UTF8 for him. When strings are saved todisk they are all UTF8 on all platforms so files can be exchangedbetween Linux and Windows (and others).


Cons:

Because LCL and other libraries use UTF8 there could be a performaceimpact when compiling to non-UTF8 OS (where API's use ANSI or otherUTF16 or whatever).




I would prefer model 3.)


--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Re: [Lazarus] Does Lazarus support a complete Unicode Component Library?

Reply via email to