http://cflib.org/udf/removeHTML
http://cflib.org/udf/stripHTML
http://cflib.org/udf/tagStripper

On Wed, Apr 29, 2009 at 4:50 AM, Robert Rawlins <
robert.rawl...@thinkbluemedia.co.uk> wrote:

>
> Hey Chaps,
>
> I've been doing a little work with some RSS feeds of late, and on the most
> part all is very well, now, the one problem I'm running into is people who
> publish RSS feeds containing lots of junk HTML (urgh!), like inline links,
> images, divs and whatnot in the description content of the feed.
>
> I only want to have the plain text version of these feeds and not all the
> other junk. This means stripping out the html tags <div>, <a> etc, some of
> which are being published as &lt; and &gt;. Also, I want to convert HTML
> formatted characters into their nice plain text equivilants, for instance
> making &amp; just a standard &.
>
> Now presumably this can all be done with REGEX (I couldn't find any nice
> built in CF functions) however my skills in this area are pretty much
> non-existent, however I know some of you are fairly experienced with this
> kind of thing.
>
> I'm also hoping that I'll be able to do some form of REGEX related 'find'
> on the rules first so I can say to the user 'this feed appears to contain
> lots of redundant crap, would you like it cleaned for you? this may cause
> formatting issues.' or something to that effect, I can then process the
> replace rules if they choose to do so.
>
> I'd appreciate any advice.
>
> Rob
>
> 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;207172674;29440083;f

Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:322055
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4

Reply via email to