Re: [Lazarus] rewriting of LConvEncoding

Hans-Peter Diettrich Fri, 24 Sep 2010 03:43:28 -0700

Guy Fink schrieb:

UTF32 is nowhere supported at all with FPC atm, and to be honest, I
don't see a reason to start now.  The unicode Delphi's also don't
provide a type for it.  It is simply the most practical format, and
the few places


Is that really a reason not to start support for it?


What kind of support are you missing?

I don't think
so. I even think it is a reason to support it, Delphi does not have
full Unicodesupport, FPC will have.


What kind of applications will need such support?

IMO it's perfectly sufficient for 99.999% of all applications, whenUnicode text can be stored and displayed.For mere storage the encoding is irrelevant, or given by database datatypes.

For display purposes the OS specifies the encoding to use.

Further direct processing of such strings is limited to comparison,search, extraction and concatenation of substrings, what also ispossible with every encoding, with no speed penalty. Transformations(upper, lower...) deserve according functions, that are provided bystandard libraries, where again the libraries specify the supportedencodings. Most such transformation applies *only* to the characterbased (alphabetic) codepages in the BMP, not to "word" based (Chinese,old Egypt...) codepages.


For all these purposes support of UTF-8 and -16 is perfectly sufficient.

The only place for 4 byte (UTF-32) characters might be an according chartype, but the existence of ligatures and other constructs stronglysuggest to use strings for storing even single character codes. For thesame reason it's *not* wise to iterate through strings by index, insteaditerator functions for the next/preceding character index have to beused. Pascal sets of such an char type are impractical, wasting 128MB ofmemory for *every single* set variable or constant. Does anybody know ofan alphabetic codepage with more than 256 character codes?

UTF32 is there in the world, and yes it is wasteful.. And so what? Is
that a reason to ignore it?

Please give only a *single* reasonable application, where UTF-32 wouldresult in an improvement over the existing string types and encodings. Icannot remember any single user, who was *really* familiar with fullUnicode text manipulation and all related complications, and who wantedto have a native UTF-32 encoding for strings.

Well, one of the reasons is that the unit is mainly used for
embedded applications (which includes DOS and win9x nowadays) or
special cases (like  very, very compatible installers), since on
normal targets the OS routines are used.


These routines do not support all of the codepages. Further, the aim
of a library is not to wrap some OS routines but to deliver
functionality to the developer to help him solve his problem.

The implementation and *continued* support of such additional librariesshould be up to companies or (at least) appropriately skilled usergroups, familiar with all implemented codepages. Everybody can startsuch projects, independently from any programming language and compiler.And there is no need that such libraries *must* become part of the corelibraries, or that they *must* replace existing libraries. They can beimplemented and used as additional libraries as well, and the *users*will judge about their value.


DoDi


--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Re: [Lazarus] rewriting of LConvEncoding

Reply via email to