2018-05-22 12:32:20 +0200, Joerg Schilling: [...] > > bash's [a-z] still matches on A..Y or B..Z though (source of > > much consusion, many bugs and lots of ranting), and that > > makes me realise that bash is actually one of those utilities > > This strange and unexpected behavior did cause once that bash > removed important files for me. Sorry, I don't remember which > locale I used at that time. > > I would call this behavior a security risk. [...]
Note that (AFAICT from testing) ksh93 behaves like bash in that its ranges are based on collation order, but it has an extra feature in that - if both ends of the range are lowercase letters (or collating elements whose first character is a lowercase letter), then [<start>-<end>] matches on collating elements in between <start> and <end> PROVIDED their first character is lowercase. That's why for instance m is matched by [a-z], [A-z], [a-Z] but not [A-Z] and in a Hungarian locale on a GNU system, Dz is matched by [A-Z] (even though it contains a lowercase letter) and not [a-z]. - and the corresponding case for uppercase letters In the case of the fnmatch and regexp of most systems, I don't know how they make so that [0-9] only matches on 0123456789 or [a-z] not on uppercase letters. Possibly, that's with special cases as well. Note that GNU grep/sed do match Dz with [A-Z] in Hungarian locales, but not GNU "find -name '[A-Z]'" (fnmatch doesn't seem to handle collating elements there). zsh's ranged are based on byte value in locales with single-byte charsets and unicode codepoint (wide character, which probably corresponds to unicode code point on all systems where zsh has been ported) in multi-byte ones. To me, that's the most useful approach (also the one of most modern languages). -- Stephane