There's a built function at CFLib.org which will strip HTML tags: http://cflib.org/udf/stripHTML
You could do a quick regex to get just the contents of the body tag, then run that string through the StripHTML function. That'll give you any text contained within HTML tags like <p>, <div>, etc. At that point you could do whatever you liked with the result. andy -----Original Message----- From: Anthony Webb [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 09, 2008 12:04 PM To: CF-Talk Subject: Extract text from webpage content using cfhttp I need to index web page contents for doing verity (or similar) searching. I'd like to insert just the text that a web page returns and not any of the other stuff (like html, JS, CSS, images, etc) I noticed that cfhttp.filecontent returns the entire contents of the page, anyone have a good way to get at just the text? Also, I am storing the results in a mysql database and was anticipating using the "text" data type, I assume that is the best way to go? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;203748912;27390454;j Archive: http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:308821 Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4

