As part of a mammoth custom tag to clean up HTML
markup submitted through a form, I'm trying to work out how
to escape all ampersands.

Only, of course, I don't want to escape those that are
already part of character entities. So I want this:

<a href="index.cfm?action=home.welcome&msg=1">
Go to home page & see a message</a>

to become this:

<a href="index.cfm?action=home.welcome&amp;msg=1">
Go to home page &amp; see a message</a>

BUT I obviously don't want this:

<a href="index.cfm?action=home.welcome&amp;msg=1">
Go to home page &amp; see a clich&eacute;d message</a>

to become this:

<a href="index.cfm?action=home.welcome&amp;amp;msg=1">
Go to home page &amp;amp; see a clich&amp;eacute;d message</a>

Catching numeric entities seems fine - just match against
"&[^#]". But what about the named entities? I've got a list
of them, but I don't think

&[^(#|lsquo;|rsquo;|sbquo;|ldquo;|rdquo;|bdquo;|dagger;)]

works, cos the [] character class can only match one character
at a time. I'm working my way through Jeffrey Friedl's book on
regular expressions, but I've not found a solution yet.

Any offers?

- Gyrus

~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- [EMAIL PROTECTED]
work: http://www.tengai.co.uk
play: http://www.norlonto.net
- PGP key available
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Structure your ColdFusion code with Fusebox. Get the official book at 
http://www.fusionauthority.com/bkinfo.cfm
FAQ: http://www.thenetprofits.co.uk/coldfusion/faq
Archives: http://www.mail-archive.com/[email protected]/
Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists

Reply via email to