you need a regex that matches everything between < and > minus the ones you
want to keep like <br> or <b>.

how bout:

rereplace(#my_html_file#, '<{1}[A-Za-z0-9_/]*>{1}', '', 'all')

this should be :
match '<' once followed by any alphanumeric char until you hit a '>' once
and replace it with ''.

this is an explanation of regex features in javascript as implemented by
netscape (you could parse text given to you in a form through javascript
before cf even sees it):
http://developer.netscape.com/docs/manuals/js/client/jsref/regexp.htm

but you need to get the exact syntax that allaire parses. mike dinowitz did
a great right up on this very subject and i think it is chapter 12 of ben
fortas green book, not to mention his web sites.

if you can try doing this in perl before hand, i am sure it is speedier then
cf in this area.

good luck,

Alexander Sicular
Technical Director, Information Technology
The Neurological Institute of New York
Columbia Presbyterian Medical Center
212.305.1318
[EMAIL PROTECTED]



-----Original Message-----
From: Randy Pringle [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, February 07, 2001 1:43 AM
To: CF-Talk
Subject: Parsing HTML in ColdFusion


We need to parse an HTML page, and remove all HTML tags. Could someone 
please explain how to do this in ColdFusion? There is a component in ASP 
that allows this sort of thing, but we prefer to do it ColdFusion.

Any help would be greatly appreciated.

Randy Pringle & Khalifa Al-Kuwari

RasGas
Doha
Qatar
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Structure your ColdFusion code with Fusebox. Get the official book at 
http://www.fusionauthority.com/bkinfo.cfm

Archives: http://www.mail-archive.com/[email protected]/
Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists

Reply via email to