On Tue, 10 Jul 2001, Sean M. Burke wrote:
> They add: "We recommend that authors avoid using all of these features."
> Pretty strong words, considering the W3's usual delusional mind-set about,
> well, nearly anything to do with user-agents. ("You mean Netscape doesn't
> have nsgmls in its parser?!?")
I just erase them and all the other junk that is created. I haven't had
the need to run this script on many documents, but it's done a remarkable
job on the ones I've seen so far.
#!/usr/bin/perl -w -i.bak
$/ = "\n\n";
while (<>) {
# get rid of the style sheet stuff
s/<span.*?>//gs;
s/<\/span>//g;
s/class=\w+//gs;
s/style=\'.*?\'//gs;
# junk
s/<o:p><\/o:p>//g;
s/<!\[if !supportEmptyParas\]> <!\[endif\]>/ /gs;
s/<!\[if !supportLists]>(.*)<!\[endif\]>/$1/gs;
# do trailing blanks
s/\s+>/>/g;
s/ +$//;
print;
}
--
Matthew Darwin
Community Volunteer
[EMAIL PROTECTED]
http://www.davin.ottawa.on.ca/~matthew/