On Wed, Jul 21, 2010 at 9:46 AM, Aaron Crane <p...@aaroncrane.co.uk> wrote:

>
> > I think that "Ā" .. "Ē" should ĀĂĄĆĈĊČĎĐĒ
>
> If that's in the hope of producing a more "intuitive" result, then why
> not ĀB̄C̄D̄Ē?
>
> That's only partly serious.  I'm acutely aware that choosing a baroque
> set of rules makes life harder for both implementers and users (and,
> in particular, risks ending up with an operator that has no practical
> non-trivial use cases).
>

Well... actually, I got to thinking (which is not my natural state) and I
think we need two approaches. I don't know if they're two operators, a
pragma or what, but there are definitely two things people want:


   - "x".succ_uni yields "x".ord incremented until the resulting codepoint
   "agrees" with "x". By agrees, I mean that it shares the same script and
   general category properties (major/minor). This is an important tool because
   it's universal.
   - "x".succ_loc yields the next character after "x" in the current locale.
   What convinced me that this is a peer to the above was when I thought about
   Japanese, where only a subset of the CJK ideographs are valid Japanese. You
   really need an index and collation for these that is outside of the basic
   Unicode properties.


So yes, if there's a locale in which ĀB̄C̄D̄Ē is the correct ordering, then
I do think that there should be some "Ā" .. "Ē" equivalent that yields the
above in that context. But, I'm not convinced it should be the default.


> I note also that this A-macron and E-macron are in NFC.  I think that,
> certainly by default, the difference between NFC and NFD should be
> hidden from users.  That implies that, however "Ā" .. "Ē" behaves, the
> NFD version should behave identically; and that "B̄" .. F̄ should
> behave in the most equivalent way possible.
>

As I've said previously, I'm only discussing single "characters" which I'm
defining as single codepoints which are neither combining nor modifying. If
you like, we can have the conversation about what you do when you encounter
combining and modifying codepoints, and I do think I agree with you largely,
but I'd like to hold that for now. It's just too much of a rat-hole at this
point.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs

Reply via email to