: Some of the files have HTML headers and footers.  I don't want any data
: inside HTML brackets.   I tried:
: 
:   s/<*>//g; 
: 
: I don't understand why this doesn't work.

Because (a) "<*" in a regex means "zero or more less-thans", and (b)
Perl regex matching is greedy- it matches the longest string that can
make the match true. What you want to do is something like this:

        s/<.*?>//g;

In this one, ".*" means "zero or more of anything, which under normal
circumstances would mean > as well, except that ".*?" means "match the
shortest string that makes the regex true".  So "<.*?>" will match the
shortest string between < and >.

Alternatively, you could do this:

        s/<[^>]*>//g;

which says "match a <, followed by zero or more characters that aren't
>, and then a >". I think the first looks clearer, but the second sounds
more obvious.

: Thanks for any help.  (Actually, thanks for writing my program for me;
: although I'm trying hard to do it myself.)   ;-)

oh... in that case, ignore everything we've said. ;)
--
Tim Kimball · ACDSD / MAST        ¦ 
Space Telescope Science Institute ¦ We are here on Earth to do good to others.
3700 San Martin Drive             ¦ What the others are here for, I don't know.
Baltimore MD 21218 USA            ¦                           -- W.H. Auden

Reply via email to