On Wed, Jun 07, 2006 at 02:57:35PM +0200, Mattias Gaertner wrote:
> On Wed, 7 Jun 2006 12:24:15 +0300
> ik <[EMAIL PROTECTED]> wrote:
> 
> > Hi,
> > 
> > AnsiString can contain insdie UTF-8. WideString is for UTF-16 and
> > UCS-2, that requires more bytes then UTF-8.
> > 
> > Please note that each char on UTF-8 is two bytes (like AnsiString),
> > while UTF-16 is 4 bytes (if I remember correctly).
> 
> No.
> UTF-8 needs 1 to 4 bytes. For example 1 byte for for ASCII characters, 2
> bytes for german umlaute. 
> UTF-16 needs 1 to 2 words for each character.
> For examples the two character string 'a' plus umlaut 'o' needs 3 bytes as

Afaik it can be even more (diacritics on rare characters) up to 16 bytes or
so. 

Unicode is quite complicated ( http://www.stack.nl/~marcov/unicode.jpg +/-
1400 bigger than A4 pages ). I browsed through it, and ieew.

FPC and Lazarus have a considerable backlog on unicode problems and
enhancements. Partially this is probably historical, the bulk of the
developers can live with the fairly standard Latin 1 (ISO 8859-1(5))
encoding, or even with plain ascii.

To really let unicode usage improve significantly, I think there must be a
team (or at least one long-term volunteer) that can really start to dig into
these issues and use/need unicode every day. 

Starting bottom up, testing FPC unicode support (language),
resolving/pinpoiting/reproducing bugs, providing unicode alternatives for
the string routines, testing unicode cleanness of various system libraries
etc etc.

Nothing is easy with language and internationalization, and even Ascii is
not easy. I had to change a simple capitalisation routine last week because
a customer pointed out to me that in Dutch "ij" is one single letter in
typography, and must be capitalized in its entirety. This seemed logical to
me, as a native speaker, but while coding the capitalization routines I
didn't think of it.

_________________________________________________________________
     To unsubscribe: mail [EMAIL PROTECTED] with
                "unsubscribe" as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives

Reply via email to