Stephane Chazelas <[email protected]> wrote:
 |2018-05-15 16:55:45 -0500, Eric Blake:
 |> On 05/15/2018 03:43 PM, Stephane Chazelas wrote:
 |>>Does that mean that [0-9] is also guaranteed to match on
 |>>0123456789 only? And that then [[:digit:]] in regexp/fnmatch is
 |>>close to useless as it's longer than [0-9]
 |> 
 |> Yes, I think that's a fair conclusion for the C locale, by virtue of the
 |> fact that the standard requires the encoding for 0-9 to be contiguous \
 |> and in
 |> order.
 |> 
 |>>and is a bit
 |>>misleading as it suggests it would be affected by localisation
 |>>(like the other character classes) while it's not.
 |> 
 |> It's still useful in non-C locales within regexp, since ALL uses of - for
 |> ranges within [] has unspecified (or was it implementation-defined)
 |> semantics outside of the C locale.  Using a named reference guarantees the
 |> desired semantics of exactly 10 characters, rather than skirting on the
 |> grounds of whether the range operator behaves as desired in all locales
 |> rather than just the C locale.
 |[...]
 |
 |OK, so to rephrase and make sure I understand correctly. In
 |locales other than C, [[:digit:]] will be guaranteed to match on
 |0123456789 only but not [0-9]. 0123456789 are guaranteed to be
 |in that order but [0-9] is unspecified anyway outside of the C
 |locale.
 |
 |That's a bit counter-intuitive and (as noted by @isaac at
 |https://unix.stackexchange.com/questions/414226/difference-between-0-9-digit\
 |-and-d/414230?noredirect=1#comment804362_414230)
 |is the opposite of what perl (in unicode mode), php (in unicode
 |mode), pcre (with (*UCP)) do: their [0-9] matches 0123456789
 |while their \d/[[:digit:]] match based on Unicode properties so
 |other decimal digits than the 0123456789 ones.

Unicode knows about decimal numbers, hexdigits and
ascii_hexdigit[s].  If i recall correctly the property of the
former is to offer ten successive numbers which correspond to what
we know as digits, while possibly looking different etc.  Given
the latter property it makes sense to treat [0-9] as ASCII
compatible but let [:digit:] match whatever a language desires.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Reply via email to