In that case you'd need to use a program that is not line-oriented.
Something like 

#!/usr/bin/env python2
import re, sys
ifd = open(sys.argv[1],'r')
pat = re.compile( r'\wwidth"51"\w+height="20"', re.DOTALL)
ibuf = ifd.read()
ifd.close()
obuf = pat.ub('',ibuf)
ofd = open(sys.argv[2],'w')
ofd.write( objf)
ofd.close()

(WARNING -- untested code!!) which reads the entire html file,
processes in one fell swoop, and writes it out.  Line breaks are 
always a pain.

On Tue, 30 Dec 2003, Andrew Gaffney wrote:

> Andrew Gaffney wrote:
> > Andrew Gaffney wrote:
> > 
> >> I need to strip out the string ' width="51" height="20"' from about 50 
> >> HTML documents. Is there a simple way to do this with a bash/sed or 
> >> perl one-liner?
> > 
> > 
> > Nevermind, from google'ing, I was able to fine:
> > 
> > perl -pi -e 's/ width="51" height="20"//' *.html
> 
> Although, there is one case this doesn't work for. In some of the HTML files, the 
> text I'm 
> looking to strip is split over 2 lines like:
> 
> <a href="someurl"><img src="button.gif" border="0" width="51"
>          height="20"></a>
> 
> How would I strip the text in this case?
> 
> -- 
> Andrew Gaffney
> 
> 
> --
> [EMAIL PROTECTED] mailing list
> 


--
[EMAIL PROTECTED] mailing list

Reply via email to