Re: What I sed

kelsey hudson Mon, 19 Feb 2007 11:26:44 -0800

James G. Sack (jim) wrote:

For further thought, the only specials within brackets ought to be '-'
(for ranges) and ']' (the end-delimiter for the character class).
Then you kinda have to add '\' to the specials so that you can write
'\]' to mean a literal ']'. You can also use '\-' and, of course '\\'.
By convention, putting the '-' as the first or last character in the
brackets also means a literal '-'. I suppose putting ']' as the first
character might logically also mean a literal ']' (since otherwise you
have an empty class -- maybe it actually works that way?. Arguably,
using '\-', '\]' is better than remembering additional conventions, but
I mention it so that you will recognize them when you see them.

There are a few characters which have special significance in bracketexpressions. For one, the carat (^) character at the beginning of theexpression indicates that it should be an exclusive list, ie. one whichthe characters present in the expression should not be present in thematch. Also significant is the ] character; to make it part of anexpression it must be the first character in the class; that is,expressions of the form []a-zA-Z] and [^]a-zA-Z] are syntacticallycorrect (you were right in this case). To include a literal - character,it must either be the first or last character in the class or the secondendpoint of a range ([a--]). All other special characters lose theirsignificance (including \). There are also collating elements (beginningwith [. and ending with .]) and equivalence classes that can be put intobracket expressions (equivalence classes begin with [= and end with =]). There are also a bunch of locale-independent character classes thatcome in very handy (and are enclosed in [: and :] within bracketexpressions):

alnum -- matches all alphanumeric characters (equivalent to [0-9A-Za-z]in the 'C' locale)

alpha  -- matches all upper- and lower-case characters (eq. [A-Za-z])
blank  -- matches a space or tab
cntrl  -- matches any control character (ASCII 0x01-0x1F)
digit  -- matches any digit character (eq. [0-9])
graph  -- matches any printable character except space (ASCII 0x20)
lower  -- matches lower-case characters (eq. [a-z])
print  -- matches any printable character (including ASCII 0x20)

punct -- matches punctuation characters (any character not in [:alnum:]but in [:graph:])space -- matches whitespace (equivalent to space, formfeed, newline,carriage return, tab, vertical tab)

upper  -- matches upper-case characters ([A-Z])
xdigit -- matches characters which may be hexadecimal digits ([0-9A-Fa-f]).

To use these character classes in a bracket expansion, format them as such:

s/[[:space:]]/ /g

will replace all occurrences of the characters in that class with asingle space.


(read wctype(3), isalpha(3), and regex(7) for more on these).

regular expressions are a tough thing to master. Keep working at it,though, and pretty soon those strings of what originally looks like linenoise will become clear :)


cheers,
-kelsey


--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Re: What I sed

Reply via email to