http://cflib.org/udf/removeHTML http://cflib.org/udf/stripHTML http://cflib.org/udf/tagStripper
On Wed, Apr 29, 2009 at 4:50 AM, Robert Rawlins < robert.rawl...@thinkbluemedia.co.uk> wrote: > > Hey Chaps, > > I've been doing a little work with some RSS feeds of late, and on the most > part all is very well, now, the one problem I'm running into is people who > publish RSS feeds containing lots of junk HTML (urgh!), like inline links, > images, divs and whatnot in the description content of the feed. > > I only want to have the plain text version of these feeds and not all the > other junk. This means stripping out the html tags <div>, <a> etc, some of > which are being published as < and >. Also, I want to convert HTML > formatted characters into their nice plain text equivilants, for instance > making & just a standard &. > > Now presumably this can all be done with REGEX (I couldn't find any nice > built in CF functions) however my skills in this area are pretty much > non-existent, however I know some of you are fairly experienced with this > kind of thing. > > I'm also hoping that I'll be able to do some form of REGEX related 'find' > on the rules first so I can say to the user 'this feed appears to contain > lots of redundant crap, would you like it cleaned for you? this may cause > formatting issues.' or something to that effect, I can then process the > replace rules if they choose to do so. > > I'd appreciate any advice. > > Rob > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:322055 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4