Stewart Stremler wrote:

Perhaps it has something to do with the first time I saw i10n and l10n
it wasn't in a very good font, and I read it as "il0n" and "llOn" and
it made no sense whatsoever.

To me, it's pain level.

localization seems to be underneath my "chunk" processing length for reading. I absorb it quite readily. l10n causes me to need to do a brain shift because the l and 1 are not separated by much Hamming distance. The shift from localization to l10n just doesn't save enough characters to justify its jarring effect.

internationalization, on the other hand, seems to be above my chunk processing length. The Hamming distance between i and 1 is large enough that it isn't so jarring. And the difference between internationalization and i18n is enough characters that it seems to be worthwhile to absorb.

The risk isn't that casual words that can be inferred from context,
but rather that URLs that a user is instructed to go to can't be checked.

Yes, but the solution to that is for banks to issue a token to access their websites just like you need a token to access the ATM.

Language is, by definition, messy and imprecise.

Why does china and tiawan and japan need efficient representations
for words?

Heh. Efficiency is in the eye of the beholder--number of strokes, space on page, number of distinct characters, ease of learning, ease of reproduction. Kanji may be space efficient, but it often uses more individual strokes. I can also argue that it may not even be space efficient. Many Kanji are at their limit of shrinkability when written at normal size; English letters can generally be reduced by a photocopier quite significantly and still retain legibility.

Talking about efficiency and language is very subjective.

Heh. Right. Nobody takes me serioiusly *here*, and you think someone
who's made a career out of unicode is going to take a suggestion to
scrap the whole thing and start over?

Certainly not without a concrete implementation so that I can actually *see* how much better or worse you are.

Actually, if you wanted to prove your superiority, put the glyphs into something like Dasher and let people play.

(If you look at the ASCII encoding, a lot of work went in to making
it *sensible*. It's not a simple enumeration of the available glyphs.)

Riiiiight. So, how many of the 32 characters do we actually use below ASCII 0x20? And somehow everybody uses the C representations like "\0" rather than the ASCII "NUL". Quick, which C character is CR and which is LF? Not very mnemonic.

So they went back and included all those family names, rarely used
characters, and historical characters?

Yes, I actually believe that they did. They even have things for stuff like Linear B.

Unicode dropped a lot of "less frequently used" symbols, at least
that was the way it was last time this conversation went around and
I spent a fair bit of time reading up on the pro/con unicode arguments.
It's exhausting, but not really exhaustive...

I believe that was necessary to fit a useful subset completely inside UTF-16 when it was required to only use 2 bytes. Now that there are mechanisms for creating pairs of UTF-16 symbols which represent one Unicode code point, this is no longer necessary.

2^64 gives us all permutations of an 8x8 array of pixels. Let's just
declare 64 bits the new wordsize, and all get modern at the same time.
64-bit addressable machines? anyone?  Let's just be fair about it.

8x8 is woefully insufficient for quite a lot of Kanji.

-a

--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Reply via email to