Aaron Sherman <a...@ajs.com> wrote:
> There's just an undefined codepoint smack in the middle of the Greek
> uppercase letters (U+03A2). I'm sure the Unicode specs have a rationale for
> that somewhere, but my guess is that there's some thousand-year-old debate
> about the Greek alphabet behind it.

It becomes clearer if you also look at the corresponding lower-case characters:

U+03A1 Greek capital letter rho
U+03A2 (none)
U+03A3 Greek capital letter sigma

U+03C1 Greek small letter rho
U+03C2 Greek small letter final sigma
U+03C3 Greek small letter sigma

Greek words written in lower-case that end in a sigma use a special
glyph for that sigma; and Unicode allocates a codepoint to it for
roundtripping to legacy character sets.  There isn't a corresponding
upper-case final sigma.  Unicode leaves the gap in the upper-case
Greek range for neatness, effectively: adding 0x20 to the numeric
value of an upper-case character yields the corresponding lower-case
version.

> I think that "Ā" .. "Ē" should ĀĂĄĆĈĊČĎĐĒ

If that's in the hope of producing a more "intuitive" result, then why
not ĀB̄C̄D̄Ē?

That's only partly serious.  I'm acutely aware that choosing a baroque
set of rules makes life harder for both implementers and users (and,
in particular, risks ending up with an operator that has no practical
non-trivial use cases).

I note also that this A-macron and E-macron are in NFC.  I think that,
certainly by default, the difference between NFC and NFD should be
hidden from users.  That implies that, however "Ā" .. "Ē" behaves, the
NFD version should behave identically; and that "B̄" .. F̄ should
behave in the most equivalent way possible.

-- 
Aaron Crane ** http://aaroncrane.co.uk/

Reply via email to