Re: can [[:digit:]] match something other than 0123456789?

Stephane Chazelas Tue, 15 May 2018 15:26:59 -0700

2018-05-15 16:55:45 -0500, Eric Blake:
> On 05/15/2018 03:43 PM, Stephane Chazelas wrote:
> >
> >Does that mean that [0-9] is also guaranteed to match on
> >0123456789 only? And that then [[:digit:]] in regexp/fnmatch is
> >close to useless as it's longer than [0-9]
> 
> Yes, I think that's a fair conclusion for the C locale, by virtue of the
> fact that the standard requires the encoding for 0-9 to be contiguous and in
> order.
> 
> >and is a bit
> >misleading as it suggests it would be affected by localisation
> >(like the other character classes) while it's not.
> 
> It's still useful in non-C locales within regexp, since ALL uses of - for
> ranges within [] has unspecified (or was it implementation-defined)
> semantics outside of the C locale.  Using a named reference guarantees the
> desired semantics of exactly 10 characters, rather than skirting on the
> grounds of whether the range operator behaves as desired in all locales
> rather than just the C locale.
[...]


OK, so to rephrase and make sure I understand correctly. In
locales other than C, [[:digit:]] will be guaranteed to match on
0123456789 only but not [0-9]. 0123456789 are guaranteed to be
in that order but [0-9] is unspecified anyway outside of the C
locale.

That's a bit counter-intuitive and (as noted by @isaac at
https://unix.stackexchange.com/questions/414226/difference-between-0-9-digit-and-d/414230?noredirect=1#comment804362_414230)
is the opposite of what perl (in unicode mode), php (in unicode
mode), pcre (with (*UCP)) do: their [0-9] matches 0123456789
while their \d/[[:digit:]] match based on Unicode properties so
other decimal digits than the 0123456789 ones.

-- 
Stephane

Re: can [[:digit:]] match something other than 0123456789?

Reply via email to