On Wed, Jul 21, 2010 at 1:28 AM, Aaron Sherman <a...@ajs.com> wrote:

>
> For reference, this is the relevant section of the spec:
>
> Character positions are incremented within their natural range for any
> Unicode range that is deemed to represent the digits 0..9 or that is deemed
> to be a complete cyclical alphabet for (one case of) a (Unicode) script.
> Only scripts that represent their alphabet in codepoints that form a cycle
> independent of other alphabets may be so used. (This specification defers to
> the users of such a script for determining the proper cycle of letters.) We
> arbitrarily define the ASCII alphabet not to intersect with other scripts
> that make use of characters in that range, but alphabets that intersperse
> ASCII letters are not allowed.
>
>
> I'm not sure that all of that tracks with the Unicode standard's use of
> some of the terms, but based on what we've discussed, perhaps we could get
> more specific there:
>
> Character positions are incremented within their Unicode Script, but only
> in keeping with their General Category property. Thus C<"A"++> yields C<"B">
> which is the next codepoint, but C<"Ă"++> yields C<"Ą"> even though "ą"
> falls between the two, when incrementing codepoints. Should this prove
> problematic for any specific Unicode Script which requires special handling
> (e.g. because a "letter" really isn't used as a letter at all), such special
> handling may be applied, but the above is the general rule.
>
>
Oh, so close! I realized that I broke the original spec, here. We need to
add back in:

There are two special cases: the ASCII-compatible lower-case letters (a-z)
and the ASCII-compatible upper-case letters (A-Z). For historical reasons,
these, by default, will not increment past the end of their ranges into the
higher-codepoint Latin characters.


Note: we might want a pragma for that as well. I'd suggest that perhaps it
should be a locale-specific feature? So, if you set your locale to fr, then
you include in those ranges all of the Latin characters used in French.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs

Reply via email to