Re: can [[:digit:]] match something other than 0123456789?

Stephane CHAZELAS Tue, 22 May 2018 05:50:00 -0700

2018-05-22 12:32:20 +0200, Joerg Schilling:
[...]
> > bash's [a-z] still matches on A..Y or B..Z though (source of
> > much consusion, many bugs and lots of ranting), and that
> > makes me realise that bash is actually one of those utilities
> 
> This strange and unexpected behavior did cause once that bash
> removed important files for me. Sorry, I don't remember which
> locale I used at that time.
> 
> I would call this behavior a security risk.
[...]


Note that (AFAICT from testing) ksh93 behaves like bash in that
its ranges are based on collation order, but it has an extra
feature in that

- if both ends of the range are lowercase letters (or collating
  elements whose first character is a lowercase letter), then
  [<start>-<end>] matches on collating elements in between
  <start> and <end> PROVIDED their first character is lowercase.

  That's why for instance m is matched by [a-z], [A-z], [a-Z]
  but not [A-Z] and in a Hungarian locale on a GNU system, Dz is
  matched by [A-Z] (even though it contains a lowercase letter)
  and not [a-z].

- and the corresponding case for uppercase letters


In the case of the fnmatch and regexp of most systems, I don't
know how they make so that [0-9] only matches on 0123456789 or
[a-z] not on uppercase letters. Possibly, that's with special
cases as well. Note that GNU grep/sed do match Dz with [A-Z] in
Hungarian locales, but not GNU "find -name '[A-Z]'" (fnmatch
doesn't seem to handle collating elements there).

zsh's ranged are based on byte value in locales with single-byte
charsets and unicode codepoint (wide character, which probably
corresponds to unicode code point on all systems where zsh has
been ported) in multi-byte ones. To me, that's the most useful
approach (also the one of most modern languages).

-- 
Stephane

Re: can [[:digit:]] match something other than 0123456789?

Reply via email to