Chad I. Uretsky wrote:
After I sent my original e-mail, I decided it might be better to stick with something more along the lines of your original regexp. So the following will work as well, and actually better, since it will also eliminate your custom tags if anything other than the tag name is entered (this is accomplished by using a negative look-behind assertion, as opposed to the negative look-ahead assertion I previously gave you):
$allowed_tags = "TYPE|DESCRIPTION"; s/<(?!<$allowed_tags)([^>'"]*|(['"]).*?\1)*>//sig;
I found that the original negative look-ahead assertion still permitted "custom" tags of the format: <TYPE abc="def"> - if you wish to permit something like this, then use the negative look-ahead assertion (?!{expr} as opposed to the negative look-behind assertion (?!<{expr}).
Again, I keep the tags which you allow in a variable as opposed to directly in the regexp only for simplified management. You could just as easily write the regexp as:
s/<(?!<TYPE|DESCRIPTION)([^>'"]*|(['"]).*?\1)*>//sig;
Chad Uretsky
Lead Network and Security Engineer
NetIQ Corporation
[EMAIL PROTECTED]
Direct: 713-418-5200
www.netiq.com
-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Craig Cardimon Sent: Thursday, February 24, 2005 8:02 AM To: ActivePerl Subject: Best way to selectively strip HTML tags
I need to strip HTML-style "<" and ">" tags and their contents from ASCII text while not disturbing customized tags that might say <TYPE> or <DESCRIPTION>. Is there a way to do this without going bonkers?
I'm using
***
s/<(?:[^>'"]*|(['"]).*?\1)*>//gs;
***
to strip all angle braces from the text. It works like gangbusters, but I've discovered it works a little too well.
Now I need something I can apply more judiciously. Any hints, tips, or suggestions would be appreciated.
--
Craig Cardimon AUS INC. (Knowledge Express Data Systems; 1-800-529-5337, ext. 24)
--- avast! Antivirus: Outbound message clean. Virus Database (VPS): 0509-4, 03/03/2005 Tested on: 3/3/2005 4:15:08 PM avast! - copyright (c) 1988-2004 ALWIL Software. http://www.avast.com
_______________________________________________ ActivePerl mailing list [email protected] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
