Re: [lazarus] About widestring programming...

lazarus . mramirez Wed, 07 Jun 2006 08:07:47 -0700

>UTF-8 needs 1 to 4 bytes. For example 1 byte for for ASCII characters, 2
>bytes for german umlaute.


I think there's a "monobyte"/"monospace"/"fixed" UCS-1 equivalent for
UTF-8 that doesn't support all characters...

> UTF-16 needs 1 to 2 words for each character.

UCS-2 uses 2 bytes ALWAYS. Some tools, such text editors save files as
UTF-16, but internally uses UCS-2.

"Fixed Encodings" such as UCS-2/ANSI (same number of bytes for each
character) are easier to use in a programming (libraries, functions)
enviroment, while "Multiple Byte Encodings", such as UTF-16/MBCS, are more
difficult to use.

Altought there is a "penalty" in space or memory, in Fixed Encofings,
since sometimes characters use empty  or zero bytes...

The UTF16/UCS-2 combination is still very useful.

Originally, Unicode was going to UCS-2, until the Unicode comite found out
that this method was kind of expensive, since many language didn't fit in
2 bytes, while others use an extra byte that never was going to be used...

> The LCL will support UTF-8 and provide some extra functions for UTF-16,
> because UTF-8 is more compatible to existing pascal programs. This is not

Eventually, a UCS-2 LCL version may be useful. If I get enough spare time...

>linux/gtk2 has almost only UTF-8 fonts, and that's why the 'courier' font

When windowze (can I mention that O.S. or is "forbidden" ? ), only had
ANSI charcodes, I remember when you need character for some region you
need an additional font, like Occidental Arial, Cyrilic Arial, East Europe
Arial, Greek Arial, and so on.

Could someone confirm if that's the same for *Linux in non Unicode fonts ?

We could use some kind of wrapper, if the user wants "Courier" font, and
uses Greek, we could "redirect" the "widestring" control to use the "Greek
Courier" font, instead of the "Occidental Courier" font.

A "O.S. Service Layer", for 2 byte characters "widechar" and strings
"widestrings", that emulates UTF-16/UCS-2 Unicode, using ANSI/UTF-8/UCS-1
could be quite useful...

I won't suggest use UTF-32/UCS-4 yet. One step at a time.

> unicode characters.

In windowze, you may check Unicode fonts, to detect which segments are
supported. I haven't check that in *Linux, but I suppouse its the same.

One more thing, somewhere I read that Unix (c) used 7 byte characters
(UTF-7), and *Linux 8 byte characters (UTF-8), could someone confirm that.

Just my 2 cents.

-----
Marco Aurelio Ramirez Carrillo
lazarus dot mramirez at star-dev dot com [dot mx]

_________________________________________________________________
     To unsubscribe: mail [EMAIL PROTECTED] with
                "unsubscribe" as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives

Re: [lazarus] About widestring programming...

Reply via email to