Re: Significant whitespace (was Re: Blogging sucks)

Philip Newton Tue, 18 Oct 2005 07:35:07 +0200 (CEST)

On 10/17/05, Peter da Silva <pe...@taronga.com> wrote:

In fact ISO-10646 is way too conservative for my tastes. I think each
distinct set of case transformation rules should have four 16-bit planes
allocated to it, so that truly internationalised characters will be able
to reliably toggle case when c&0x10000000 by flipping c&0x20000000. The
current set of wishy-washy unified characters with c&0x10000000 == 0 should
be left to rot like the hateful legacy things that they are.


This "toggle" you speak of implies that you believe there are only two cases.

Thanks to Unicode (and legacy character sets, and existing alphabets
which use digraphs which made their way into those legacy character
sets, and round-tripping between alphabets which use digraphs and
those which don't), we have three, though: upper-case, lower-case, and
title-case.

The difference between upper-case and title-case becomes apparent with
such characters as "nj" (ASCII surrogate for U+01CC LATIN SMALL LETTER
NJ), which becomes "Nj" in title-case but "NJ" in upper-case.

I'm not sure whether the hate in this case is for the coded character
sets that allowed digraphs in as single characters, or the fact that
given they exist, software ignores them by blindly uppercasing rather
titlecasing when appropriate.

(That's not even considering the characters where casing depends on
context; the most famous examples are probably Greek sigma (which has
two lower-case forms, depending [roughly speaking] on whether it's the
last letter of the word) and the letter "i" (which has two upper-case
forms, one with dot and one without, depending on the language).)
--
Philip Newton <philip.new...@gmail.com>

Re: Significant whitespace (was Re: Blogging sucks)

Reply via email to