Re: Unicode and Friends

Andrew Lentvorski Thu, 27 Oct 2005 13:49:29 -0700

Stewart Stremler wrote:

Perhaps it has something to do with the first time I saw i10n and l10n
it wasn't in a very good font, and I read it as "il0n" and "llOn" and
it made no sense whatsoever.


To me, it's pain level.

localization seems to be underneath my "chunk" processing length forreading. I absorb it quite readily. l10n causes me to need to do abrain shift because the l and 1 are not separated by much Hammingdistance. The shift from localization to l10n just doesn't save enoughcharacters to justify its jarring effect.

internationalization, on the other hand, seems to be above my chunkprocessing length. The Hamming distance between i and 1 is large enoughthat it isn't so jarring. And the difference betweeninternationalization and i18n is enough characters that it seems to beworthwhile to absorb.

The risk isn't that casual words that can be inferred from context,
but rather that URLs that a user is instructed to go to can't bechecked.

Yes, but the solution to that is for banks to issue a token to accesstheir websites just like you need a token to access the ATM.


Language is, by definition, messy and imprecise.

Why does china and tiawan and japan need efficient representations
for words?

Heh. Efficiency is in the eye of the beholder--number of strokes, spaceon page, number of distinct characters, ease of learning, ease ofreproduction. Kanji may be space efficient, but it often uses moreindividual strokes. I can also argue that it may not even be spaceefficient. Many Kanji are at their limit of shrinkability when writtenat normal size; English letters can generally be reduced by aphotocopier quite significantly and still retain legibility.


Talking about efficiency and language is very subjective.

Heh. Right. Nobody takes me serioiusly *here*, and you think someone
who's made a career out of unicode is going to take a suggestion to
scrap the whole thing and start over?

Certainly not without a concrete implementation so that I can actually*see* how much better or worse you are.

Actually, if you wanted to prove your superiority, put the glyphs intosomething like Dasher and let people play.

(If you look at the ASCII encoding, a lot of work went in to making
it *sensible*. It's not a simple enumeration of the available glyphs.)

Riiiiight. So, how many of the 32 characters do we actually use belowASCII 0x20? And somehow everybody uses the C representations like "\0"rather than the ASCII "NUL". Quick, which C character is CR and whichis LF? Not very mnemonic.

So they went back and included all those family names, rarely used
characters, and historical characters?

Yes, I actually believe that they did. They even have things for stufflike Linear B.

Unicode dropped a lot of "less frequently used" symbols, at least
that was the way it was last time this conversation went around and
I spent a fair bit of time reading up on the pro/con unicode arguments.
It's exhausting, but not really exhaustive...

I believe that was necessary to fit a useful subset completely insideUTF-16 when it was required to only use 2 bytes. Now that there aremechanisms for creating pairs of UTF-16 symbols which represent oneUnicode code point, this is no longer necessary.

2^64 gives us all permutations of an 8x8 array of pixels. Let's just
declare 64 bits the new wordsize, and all get modern at the same time.
64-bit addressable machines? anyone?  Let's just be fair about it.


8x8 is woefully insufficient for quite a lot of Kanji.

-a

--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Re: Unicode and Friends

Reply via email to