Re: Unicode and Friends

Stewart Stremler Thu, 27 Oct 2005 17:39:35 -0700

begin  quoting Andrew Lentvorski as of Thu, Oct 27, 2005 at 01:48:02PM -0700:
[snip]
> To me, it's pain level.
> 
> localization seems to be underneath my "chunk" processing length for 
> reading.  I absorb it quite readily.  l10n causes me to need to do a 
> brain shift because the l and 1 are not separated by much Hamming 
> distance.  The shift from localization to l10n just doesn't save enough 
> characters to justify its jarring effect.


Hmmm.... I like that way of looking at it.

> internationalization, on the other hand, seems to be above my chunk 
> processing length.  The Hamming distance between i and 1 is large enough 
> that it isn't so jarring.  And the difference between 
> internationalization and i18n is enough characters that it seems to be 
> worthwhile to absorb.

Certainly if you're trying for bricktext.

> >The risk isn't that casual words that can be inferred from context,
> >but rather that URLs that a user is instructed to go to can't be 
> >checked.
> 
> Yes, but the solution to that is for banks to issue a token to access 
> their websites just like you need a token to access the ATM.

Presumably, this sort of thing falls into the man-in-the-middle attack
category.  Most token systems don't handle that sort of threat very
well.  The token-model you use to access an ATM (your card) is a rather
weak approach.  Even if you went with a challenge-response token, the
attacker would just relay the challenge and obtain the appropriate
response.

Issuing a token that contains a public key would basically solve the
problem, but add additional infrastructure concerns in getting that to
work with the client infrastructure.

> Language is, by definition, messy and imprecise.

True. :)

> >Why does china and tiawan and japan need efficient representations
> >for words?
> 
> Heh.  Efficiency is in the eye of the beholder--number of strokes, space 
> on page, number of distinct characters, ease of learning, ease of 
> reproduction.  Kanji may be space efficient, but it often uses more 
> individual strokes.  I can also argue that it may not even be space 
> efficient.  Many Kanji are at their limit of shrinkability when written 
> at normal size;  English letters can generally be reduced by a 
> photocopier quite significantly and still retain legibility.

It's really hard to write kanji small, and harder still to make it
readable.  You can write english amazingly small with a very fine-tip pen
and a steady hand.  And then there's using a laser-printer to print
4-point fonts.

> Talking about efficiency and language is very subjective.
 
And often heated. :)

> >Heh. Right. Nobody takes me serioiusly *here*, and you think someone
> >who's made a career out of unicode is going to take a suggestion to
> >scrap the whole thing and start over?
> 
> Certainly not without a concrete implementation so that I can actually 
> *see* how much better or worse you are.

Fair 'nuff.

> Actually, if you wanted to prove your superiority, put the glyphs into 
> something like Dasher and let people play.
 
How would that help?  That's font-selection, innit?

> >(If you look at the ASCII encoding, a lot of work went in to making
> >it *sensible*. It's not a simple enumeration of the available glyphs.)
> 
> Riiiiight.  So, how many of the 32 characters do we actually use below 
> ASCII 0x20?  And somehow everybody uses the C representations like "\0" 
> rather than the ASCII "NUL".  Quick, which C character is CR and which 
> is LF?  Not very mnemonic.

I wasn't thinking of mapping down to the control characters (although
I've used the structure of ASCII to deduce the decimal values of some
of the control characters when an ASCII chart wasn't around, years ago),
but rather how A and a are related, for example.  Related symbols have
related values, in a meaningful way.

> >So they went back and included all those family names, rarely used
> >characters, and historical characters?
> 
> Yes, I actually believe that they did.  They even have things for stuff 
> like Linear B.

Cool. Is this in UTF-32-space now?

[snip]
> I believe that was necessary to fit a useful subset completely inside 
> UTF-16 when it was required to only use 2 bytes.  Now that there are 
> mechanisms for creating pairs of UTF-16 symbols which represent one 
> Unicode code point, this is no longer necessary.

That would be "yes", I imagine. :)

Hm....

Okay. I'll have to ponder this for a bit.

> >2^64 gives us all permutations of an 8x8 array of pixels. Let's just
> >declare 64 bits the new wordsize, and all get modern at the same time.
> >64-bit addressable machines? anyone?  Let's just be fair about it.
> 
> 8x8 is woefully insufficient for quite a lot of Kanji.

Well, yes.  You'd have to use multiple arrays... :)

-Stewart

-- 
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Re: Unicode and Friends

Reply via email to