On Tue, Sep 09, 2008 at 04:08:40PM -0700, Gary Kline wrote: > On Wed, Sep 10, 2008 at 12:39:41AM +0200, Roland Smith wrote: > > On Tue, Sep 09, 2008 at 03:16:08PM -0700, Gary Kline wrote: > > > > Because it is a hiddeous waste for most readers and writers of > > > > English and other European languages. > > > > > I also argured that utf-8 was a waste of a whole byte per char > > > for most of us. > > > > That's not true. UTF-8 is a variable-length encoding. It is backwards > > compatible with ASCII, i.e. ascii characters are one byte in UTF-8 as > > well. Are you thinking about UTF-16? > > > I don't know. (Mark Twain.) Back in the late 1990's I was > assigned the project of converting all the utilities I had ported > to three European languages. Until now I had no idea there was > anything *but* utf-16, i.e. 2-bytes/char.
Both UTF-8 and UTF-16 are variable-width encodings. > With memory seriously getting to be dirt-cheap, "wasting 8-bits > doesn't seem that big a deal. Indeed. > Maybe some future wizard will > invent a UTF-32 that will hold all ~90 000 Chinese characters and > these will be downsized automatically to UTF-8 when you're mixing > Mandarin with, say, Cesk [Czeck]. UTF-32 already exists, but it's a fixed-width (4 bytes) encoding. > Hmm, somebody just told me that "aigu" is not English but French > and means "acute". ...all these years i thought ... oh well. > Anyway, do you know if '\0351' is a 16-bit character? is is 0xE9 > and decimal 233 and certaing should fit into a byte. just > wondering. Obviously it is a 8-bit character; anything in the range 0-255 is. In ISO 8859-1(5) it is "é" (e with accent aigu). Please look up UTF-8,16,32 and ISO-8859-15 on Wikipedia for further enlightenment. Roland -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)
pgpmTzlhyO5ig.pgp
Description: PGP signature