2018-05-22 13:49:20 +0100, Stephane CHAZELAS: [...] > In the case of the fnmatch and regexp of most systems, I don't > know how they make so that [0-9] only matches on 0123456789 or > [a-z] not on uppercase letters. Possibly, that's with special > cases as well. [...]
Sorry, my bad. It looks like I was basing my conclusions on tests I thought I remembered doing but probably never did. [0-9] matches on characters other than 0123456789 on many systems with grep and system regexps as well. On Solaris 10, in a en_GB.UTF-8 locale, with /usr/xpg4/bin/grep, it matches on hundreds of different characters many of which have nothing to do with digits or are not even assigned in Unicode. Its [a-z] matches on ABC...WXY and hundreds more and even parts of characters like the 0xf0..0xf4 of characters U+10000 to U+10FFFF. On FreeBSD, [0-9] matches on U+2185 ROMAN NUMERAL SIX LATE FORM in addition to 0123456789 (!?). In GNU locales, whether [a-z] matches BCD..WXY or not depends on the locale and the version of glibc. [0-9] does not always include only 0123456789 either. For instance, in a th_TH.UTF-8 locale, grep '[a-z]' matches on M and grep '[0-9]' matches on U+0E50 THAI DIGIT ZERO U+0E51 THAI DIGIT ONE U+0E52 THAI DIGIT TWO U+0E53 THAI DIGIT THREE U+0E54 THAI DIGIT FOUR U+0E55 THAI DIGIT FIVE U+0E56 THAI DIGIT SIX U+0E57 THAI DIGIT SEVEN U+0E58 THAI DIGIT EIGHT (note the missing DIGIT NINE which would sort after 9). So, that confirms that it's not only a bash/ksh93 "issue", [0-9] cannot be used to match 0123456789 only and what it matches is random and useless and not what one would ever want. [a-z] is not guaranteed to match on lower case letters only let alone abcdefghijklmnopqrstuvwxyz only, it may even match on characters outside the latin script. LC_ALL=C grep '[0-9]' Would be OK, but not in locales that use charsets that have characters that contain the encoding of digits (like GB18030, BIG5...). It was requested that [[:digit:]] match only on 0123456789. While in practice, it seems to be the case for things that use the POSIX API, it's not always the case outside of it (where [0-9] generally matches on 0123456789 but [[:digit:]] can match on all sorts of decimal digits). Like perl -Mopen=locale -ne 'print if /[[:digit:]]/' So it would seem that [0123456789] is the only portable way to match on 0123456789 only. -- Stephane