Paul Eggert wrote:
For this particular task "grep for non-ASCII characters", I had just
two days before tried to solve the same problem, and discovered that
'grep', somewhat to my surprise, can't do it. This is worth either
mentioning or fixing, in my opinion.
According to the Open Group spec for Regular Expressions, which is a standard I
assume we should generally be following,
<http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_03_05>
The following character class expressions shall be supported in all locales:
[:alnum:] [:cntrl:] [:lower:] [:space:]
[:alpha:] [:digit:] [:print:] [:upper:]
[:blank:] [:graph:] [:punct:] [:xdigit:]
In addition, character class expressions of the form:
[:name:]
are recognized in those locales where the name keyword has been given
a charclass definition in the LC_CTYPE category.
Therefore "grep '[^[:ascii:]]'" ought to work as expected iff the current
locale defines that class.
Whether it DOES work is something I haven't tried to determine.
Whether Grep should support that class unconditionally, as Perl does, is
another matter. I'd say probably not; there's probably a reason why it's not
in the list of standard classes.
The Grep manual should be more explicit about the use of character classes
other than those that it says are supported.
- Julian