On Tue, 10 Jul 2001, Sean M. Burke wrote:

> They add: "We recommend that authors avoid using all of these features."
> Pretty strong words, considering the W3's usual delusional mind-set about,
> well, nearly anything to do with user-agents.  ("You mean Netscape doesn't
> have nsgmls in its parser?!?")

I just erase them and all the other junk that is created.  I haven't had
the need to run this script on many documents, but it's done a remarkable
job on the ones I've seen so far.



#!/usr/bin/perl -w -i.bak

$/ = "\n\n";

while (<>) {
        # get rid of the style sheet stuff
        s/<span.*?>//gs;
        s/<\/span>//g;
        s/class=\w+//gs;
        s/style=\'.*?\'//gs;

        # junk
        s/<o:p><\/o:p>//g;
        s/<!\[if !supportEmptyParas\]>&nbsp;<!\[endif\]>/&nbsp;/gs;
        s/<!\[if !supportLists]>(.*)<!\[endif\]>/$1/gs;

        # do trailing blanks
        s/\s+>/>/g;
        s/ +$//;

        print;
}


-- 
Matthew Darwin
Community Volunteer
[EMAIL PROTECTED]
http://www.davin.ottawa.on.ca/~matthew/

Reply via email to