Guy Fink schrieb:

UTF32 is nowhere supported at all with FPC atm, and to be honest, I
don't see a reason to start now.  The unicode Delphi's also don't
provide a type for it.  It is simply the most practical format, and
the few places

Is that really a reason not to start support for it?

What kind of support are you missing?

I don't think
so. I even think it is a reason to support it, Delphi does not have
full Unicodesupport, FPC will have.

What kind of applications will need such support?

IMO it's perfectly sufficient for 99.999% of all applications, when Unicode text can be stored and displayed. For mere storage the encoding is irrelevant, or given by database data types.
For display purposes the OS specifies the encoding to use.

Further direct processing of such strings is limited to comparison, search, extraction and concatenation of substrings, what also is possible with every encoding, with no speed penalty. Transformations (upper, lower...) deserve according functions, that are provided by standard libraries, where again the libraries specify the supported encodings. Most such transformation applies *only* to the character based (alphabetic) codepages in the BMP, not to "word" based (Chinese, old Egypt...) codepages.

For all these purposes support of UTF-8 and -16 is perfectly sufficient.

The only place for 4 byte (UTF-32) characters might be an according char type, but the existence of ligatures and other constructs strongly suggest to use strings for storing even single character codes. For the same reason it's *not* wise to iterate through strings by index, instead iterator functions for the next/preceding character index have to be used. Pascal sets of such an char type are impractical, wasting 128MB of memory for *every single* set variable or constant. Does anybody know of an alphabetic codepage with more than 256 character codes?


UTF32 is there in the world, and yes it is wasteful.. And so what? Is
that a reason to ignore it?

Please give only a *single* reasonable application, where UTF-32 would result in an improvement over the existing string types and encodings. I cannot remember any single user, who was *really* familiar with full Unicode text manipulation and all related complications, and who wanted to have a native UTF-32 encoding for strings.


Well, one of the reasons is that the unit is mainly used for
embedded applications (which includes DOS and win9x nowadays) or
special cases (like  very, very compatible installers), since on
normal targets the OS routines are used.

These routines do not support all of the codepages. Further, the aim
of a library is not to wrap some OS routines but to deliver
functionality to the developer to help him solve his problem.

The implementation and *continued* support of such additional libraries should be up to companies or (at least) appropriately skilled user groups, familiar with all implemented codepages. Everybody can start such projects, independently from any programming language and compiler. And there is no need that such libraries *must* become part of the core libraries, or that they *must* replace existing libraries. They can be implemented and used as additional libraries as well, and the *users* will judge about their value.

DoDi


--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Reply via email to