The editing dept does a Save As...*.html on all the MS-Word files we publish. However, in the process, each line in the new HTML file now ends with a paragraph mark. So, I am trying to write a script that deletes HTML tags over new lines (which I got to work), but also over paragraph marks.
What I have so far is below, the 2nd and 3rd lines from the bottom are examples of tags that span multi-lines, and in the process, span the paragraph marks. Also, I know it is not actually *doing* anything now, I am still in the testing phase, which is why all the COLOR constants are specified... ____________________ #! /usr/bin/perl use warnings; use strict; use Term::ANSIColor qw(:constants); $Term::ANSIColor::AUTORESET = 1; # $/ = ""; ###I tried it with this uncommented, the whole file becomes a big "paragraph", and nothing matches. while (<>) { #remove weird paragraph marks s/<\/?o:p>//msgi && print "$i: $`", ON_MAGENTA "|$&|", RESET "$'\n"; #remove unecessary closing tags s/<\/b>//msgi && print "$i: $`", YELLOW "|$&|", RESET "$'\n"; s/<\/span>//msgi && print "$i: $`", ON_GREEN "|$&|", RESET "$'\n"; #remove mso-spaceruns s/<span\s*(\S+\s*\S+)\">/ /msgi && print "$i: $`", ON_RED "|$&|", RESET "$'\n"; #***this is one tag that spans multi lines #remove mso image data s/<!--\[if gte vml 1\]>.*<!\[endif\]-->//msgi && print "$i: $`", GREEN "|$&|", RESET "$'\n"; #***this is one tag that spans multi lines s/(v:shapes\S+\s)//msgi && print "$i: $`", ON_BLUE "|$&|", RESET "$'\n"; } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]