Thanks, Chad, I'll try it. I've been down and out with a nasty cold. I'm sort of back now.

Chad I. Uretsky wrote:
After I sent my original e-mail, I decided it might be better to stick with
something more along the lines of your original regexp.  So the following
will work as well, and actually better, since it will also eliminate your
custom tags if anything other than the tag name is entered (this is
accomplished by using a negative look-behind assertion, as opposed to the
negative look-ahead assertion I previously gave you):

        $allowed_tags = "TYPE|DESCRIPTION";
        s/<(?!<$allowed_tags)([^>'"]*|(['"]).*?\1)*>//sig;

I found that the original negative look-ahead assertion still permitted
"custom" tags of the format: <TYPE abc="def"> - if you wish to permit
something like this, then use the negative look-ahead assertion (?!{expr} as
opposed to the negative look-behind assertion (?!<{expr}).

Again, I keep the tags which you allow in a variable as opposed to directly
in the regexp only for simplified management.  You could just as easily
write the regexp as:

        s/<(?!<TYPE|DESCRIPTION)([^>'"]*|(['"]).*?\1)*>//sig;

Chad Uretsky
Lead Network and Security Engineer
NetIQ Corporation
[EMAIL PROTECTED]
Direct: 713-418-5200
www.netiq.com



-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Craig Cardimon Sent: Thursday, February 24, 2005 8:02 AM To: ActivePerl Subject: Best way to selectively strip HTML tags


I need to strip HTML-style "<" and ">" tags and their contents from ASCII text while not disturbing customized tags that might say <TYPE> or <DESCRIPTION>. Is there a way to do this without going bonkers?


I'm using

***

s/<(?:[^>'"]*|(['"]).*?\1)*>//gs;

***

to strip all angle braces from the text. It works like gangbusters, but I've discovered it works a little too well.

Now I need something I can apply more judiciously. Any hints, tips, or suggestions would be appreciated.


--

Craig Cardimon
AUS INC.
(Knowledge Express Data Systems; 1-800-529-5337, ext. 24)


--- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0509-4, 03/03/2005 Tested on: 3/3/2005 4:15:08 PM avast! - copyright (c) 1988-2004 ALWIL Software. http://www.avast.com



_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to