On Wed, Apr 29, 2009 at 4:50 AM, Robert Rawlins <> wrote:

> Hey Chaps,
> I've been doing a little work with some RSS feeds of late, and on the most
> part all is very well, now, the one problem I'm running into is people who
> publish RSS feeds containing lots of junk HTML (urgh!), like inline links,
> images, divs and whatnot in the description content of the feed.
> I only want to have the plain text version of these feeds and not all the
> other junk. This means stripping out the html tags <div>, <a> etc, some of
> which are being published as &lt; and &gt;. Also, I want to convert HTML
> formatted characters into their nice plain text equivilants, for instance
> making &amp; just a standard &.
> Now presumably this can all be done with REGEX (I couldn't find any nice
> built in CF functions) however my skills in this area are pretty much
> non-existent, however I know some of you are fairly experienced with this
> kind of thing.
> I'm also hoping that I'll be able to do some form of REGEX related 'find'
> on the rules first so I can say to the user 'this feed appears to contain
> lots of redundant crap, would you like it cleaned for you? this may cause
> formatting issues.' or something to that effect, I can then process the
> replace rules if they choose to do so.
> I'd appreciate any advice.
> Rob

Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
Get the Free Trial;207172674;29440083;f


Reply via email to