James G. Sack (jim) wrote:
For further thought, the only specials within brackets ought to be '-'
(for ranges) and ']' (the end-delimiter for the character class).
Then you kinda have to add '\' to the specials so that you can write
'\]' to mean a literal ']'. You can also use '\-' and, of course '\\'.
By convention, putting the '-' as the first or last character in the
brackets also means a literal '-'. I suppose putting ']' as the first
character might logically also mean a literal ']' (since otherwise you
have an empty class -- maybe it actually works that way?. Arguably,
using '\-', '\]' is better than remembering additional conventions, but
I mention it so that you will recognize them when you see them.
There are a few characters which have special significance in bracket
expressions. For one, the carat (^) character at the beginning of the
expression indicates that it should be an exclusive list, ie. one which
the characters present in the expression should not be present in the
match. Also significant is the ] character; to make it part of an
expression it must be the first character in the class; that is,
expressions of the form []a-zA-Z] and [^]a-zA-Z] are syntactically
correct (you were right in this case). To include a literal - character,
it must either be the first or last character in the class or the second
endpoint of a range ([a--]). All other special characters lose their
significance (including \). There are also collating elements (beginning
with [. and ending with .]) and equivalence classes that can be put into
bracket expressions (equivalence classes begin with [= and end with =]
). There are also a bunch of locale-independent character classes that
come in very handy (and are enclosed in [: and :] within bracket
expressions):
alnum -- matches all alphanumeric characters (equivalent to [0-9A-Za-z]
in the 'C' locale)
alpha -- matches all upper- and lower-case characters (eq. [A-Za-z])
blank -- matches a space or tab
cntrl -- matches any control character (ASCII 0x01-0x1F)
digit -- matches any digit character (eq. [0-9])
graph -- matches any printable character except space (ASCII 0x20)
lower -- matches lower-case characters (eq. [a-z])
print -- matches any printable character (including ASCII 0x20)
punct -- matches punctuation characters (any character not in [:alnum:]
but in [:graph:])
space -- matches whitespace (equivalent to space, formfeed, newline,
carriage return, tab, vertical tab)
upper -- matches upper-case characters ([A-Z])
xdigit -- matches characters which may be hexadecimal digits ([0-9A-Fa-f]).
To use these character classes in a bracket expansion, format them as such:
s/[[:space:]]/ /g
will replace all occurrences of the characters in that class with a
single space.
(read wctype(3), isalpha(3), and regex(7) for more on these).
regular expressions are a tough thing to master. Keep working at it,
though, and pretty soon those strings of what originally looks like line
noise will become clear :)
cheers,
-kelsey
--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list