Re: can [[:digit:]] match something other than 0123456789?

Shware Systems Tue, 15 May 2018 17:48:06 -0700

No, in the C locale and locale definitions where the charmap includes 
definitions of <0>-<9> [:digit:] will match on [0-9]. In locales other than C 
it may not match what another locale uses for [0-9], if their charmap 
assignment is different, and may match more character assignments. This is one 
of the consequences of leaving too much implementation-defined; things that 
look like they should be portable aren't really guaranteed to be. That charmap 
might call it ARAB_ZERO to ARAB_NINE and define math operations that expect 
those in applications using that locale, in addition to what that system's C 
locale specified when compiling isdigit(). Such a locale writer won't care that 
it isn't conforming, or in that charmap an ASCII <0> may be an mbs shift-code. 
They want the charmap and locale to be compatible with whatever Arabic font is 
available, even if adding to digit means the risk of false positives.
 
In a message dated 5/15/2018 6:27:03 PM Eastern Standard Time, 
[email protected] writes:


OK, so to rephrase and make sure I understand correctly. In
locales other than C, [[:digit:]] will be guaranteed to match on
0123456789 only but not [0-9]. 0123456789 are guaranteed to be
in that order but [0-9] is unspecified anyway outside of the C
locale.

That's a bit counter-intuitive and (as noted by @isaac at
https://unix.stackexchange.com/questions/414226/difference-between-0-9-digit-and-d/414230?noredirect=1#comment804362_414230)
is the opposite of what perl (in unicode mode), php (in unicode
mode), pcre (with (*UCP)) do: their [0-9] matches 0123456789
while their \d/[[:digit:]] match based on Unicode properties so
other decimal digits than the 0123456789 ones.

-- 
Stephane

Re: can [[:digit:]] match something other than 0123456789?

Reply via email to