If you are creating the templates that are generating the HTML then you can
make it very easy on yourself by wrapping the text blocks in some kind of
marker that you can find in the cfhttp.filecontent later. Maybe wrap it in
something like <!- [BEGIN TEXT TO GRAB] --> This is the text you want
indexed <!-- [END TEXT TO GRAB] -->

Then in cffile.filecontent, search for blocks of text between the two
comment blocks.

-----Original Message-----
From: Anthony Webb [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 09, 2008 1:04 PM
To: CF-Talk
Subject: Extract text from webpage content using cfhttp

I need to index web page contents for doing verity (or similar) searching.
I'd like to insert just the text that a web page returns and not any of the
other stuff (like html, JS, CSS, images, etc)  

I noticed that cfhttp.filecontent returns the entire contents of the page,
anyone have a good way to get at just the text?

Also, I am storing the results in a mysql database and was anticipating
using the "text" data type, I assume that is the best way to go? 



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;203748912;27390454;j

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:308835
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4

Reply via email to