you need a regex that matches everything between < and > minus the ones you
want to keep like <br> or <b>.
how bout:
rereplace(#my_html_file#, '<{1}[A-Za-z0-9_/]*>{1}', '', 'all')
this should be :
match '<' once followed by any alphanumeric char until you hit a '>' once
and replace it with ''.
this is an explanation of regex features in javascript as implemented by
netscape (you could parse text given to you in a form through javascript
before cf even sees it):
http://developer.netscape.com/docs/manuals/js/client/jsref/regexp.htm
but you need to get the exact syntax that allaire parses. mike dinowitz did
a great right up on this very subject and i think it is chapter 12 of ben
fortas green book, not to mention his web sites.
if you can try doing this in perl before hand, i am sure it is speedier then
cf in this area.
good luck,
Alexander Sicular
Technical Director, Information Technology
The Neurological Institute of New York
Columbia Presbyterian Medical Center
212.305.1318
[EMAIL PROTECTED]
-----Original Message-----
From: Randy Pringle [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, February 07, 2001 1:43 AM
To: CF-Talk
Subject: Parsing HTML in ColdFusion
We need to parse an HTML page, and remove all HTML tags. Could someone
please explain how to do this in ColdFusion? There is a component in ASP
that allows this sort of thing, but we prefer to do it ColdFusion.
Any help would be greatly appreciated.
Randy Pringle & Khalifa Al-Kuwari
RasGas
Doha
Qatar
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Structure your ColdFusion code with Fusebox. Get the official book at
http://www.fusionauthority.com/bkinfo.cfm
Archives: http://www.mail-archive.com/[email protected]/
Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists