2018-05-15 16:55:45 -0500, Eric Blake: > On 05/15/2018 03:43 PM, Stephane Chazelas wrote: > > > >Does that mean that [0-9] is also guaranteed to match on > >0123456789 only? And that then [[:digit:]] in regexp/fnmatch is > >close to useless as it's longer than [0-9] > > Yes, I think that's a fair conclusion for the C locale, by virtue of the > fact that the standard requires the encoding for 0-9 to be contiguous and in > order. > > >and is a bit > >misleading as it suggests it would be affected by localisation > >(like the other character classes) while it's not. > > It's still useful in non-C locales within regexp, since ALL uses of - for > ranges within [] has unspecified (or was it implementation-defined) > semantics outside of the C locale. Using a named reference guarantees the > desired semantics of exactly 10 characters, rather than skirting on the > grounds of whether the range operator behaves as desired in all locales > rather than just the C locale. [...]
OK, so to rephrase and make sure I understand correctly. In locales other than C, [[:digit:]] will be guaranteed to match on 0123456789 only but not [0-9]. 0123456789 are guaranteed to be in that order but [0-9] is unspecified anyway outside of the C locale. That's a bit counter-intuitive and (as noted by @isaac at https://unix.stackexchange.com/questions/414226/difference-between-0-9-digit-and-d/414230?noredirect=1#comment804362_414230) is the opposite of what perl (in unicode mode), php (in unicode mode), pcre (with (*UCP)) do: their [0-9] matches 0123456789 while their \d/[[:digit:]] match based on Unicode properties so other decimal digits than the 0123456789 ones. -- Stephane