For conforming charsets XBD 6 requires the range <0>-<9> to be contiguous. By 
XBD 9.3.5, Rule 6, {:digit:] may include MBS elements aside from the <0> to <9> 
in LC_CTYPE, but the range [0-9] depends on whether additional characters have 
the same collation weight as digits. If this is the case the locale may need to 
define collating symbols that bracket the range of digits in the order list and 
use those in range expressions to ensure everything the locale considers a 
decimal digit is tested for.


The collating sequence for Japan might be something like:
collating-symbol <bgn-decimal>
collating-symbol <end-decimal>
order_start forward
...
bgn-decimal
<0>
<JPN_0> weight N
<1>
<一>    weight N+1
<2>
<二>
<3>
<三>
...
<9>
<JPN_9> weight N+8
end-decimal
...
order_end
 
and [0-9] would include <JPN_0> and the other digits, but not <JPN_9>. 
The range [[.bgn-decimal.]-[.end-decimal.]] should include <JPN_9> too.
I'm ambivalent about whether the standard should reserve symbol names like this 
for common ranges like digits, though.
 
In a message dated 5/16/2018 4:49:44 AM Eastern Standard Time, 
[email protected] writes:
 
Geoff Clare <[email protected]> wrote:

> Stephane Chazelas <[email protected]> wrote, on 15 May 2018:
> >
> > OK, so to rephrase and make sure I understand correctly. In
> > locales other than C, [[:digit:]] will be guaranteed to match on
> > 0123456789 only but not [0-9]. 0123456789 are guaranteed to be
> > in that order but [0-9] is unspecified anyway outside of the C
> > locale.
> > 
> > That's a bit counter-intuitive
>
> Not really, when you consider that ranges should use the collation
> sequence, not character encodings. (For the C/POSIX locale that's
> required - for others it's not, but it's the obvious way to implement
> ranges with multibyte characters.)

I believe the real problem is the IBM i18n implementation that internally uses 
collating values to evaluate ranges. With characters, this can result in 
stramge effects but it permits to implement [[=o=]] easily.

For digits, I would expect that there is no other glyph in between [0-9] but it 
may not be contiguous in a collating value notation.

Jörg

-- 
 EMail:[email protected] (home) Jörg Schilling D-13353 Berlin
[email protected] (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/http://sf.net/projects/schilytools/files/'


Reply via email to