Much, much cleaner. (if it works) I always forget about not needing to
escape characters inside classes. Lately, I don't get as much regex exercise
as I used to.
As for the ñ, I thought it was odd since the docs say \w is the equivalent
of A-Za-z0-9_, yet when I use \w, it matched ñ where A-Za-z0-9_ obvsiouly
would not.
There is a definite difference between the two as far as accented characters
go.
I was on CF 8 when I briefly tested what I came up with. I just ran yours
through the same test with different results.
<cfsavecontent variable="str">
aóbcñ...@#$%^&*()_+=-\][|}{';"":/.?>,<`~
FG*!&^$
</cfsavecontent>
<cfoutput>#rereplace(str, '[^\w\r \n!.?''"()&,;:]+', '<span
class="highlight">\0</span>', 'all')#</cfoutput>
This will not span the accesnted characters. Replacing \w with A-Za-z0-9_
will.
I think the best of both worlds would be "[^A-Za-z0-9_\r \n!.?''"()&,;:]+"
That is, assuming the OP wants to highlight the accented chars as well.
Thanks for the escape reminder and the the \0 tip.
.:.:.:.:.:.:.:.:.:.:.:.:.:.
Bobby Hartsfield
http://acoderslife.com
-----Original Message-----
From: Peter Boughton [mailto:[email protected]]
Sent: Tuesday, October 05, 2010 5:55 AM
To: cf-talk
Subject: Re: Highlighting non-standard ASCII characters?
Hmmm, although it works that code is not quite correct - there's a few
issues with it.
>> If you don?t mind characters like ñ, then just use \w instead of
A-Za-z0-9_
This is *incorrect* - in ColdFusion regex, \w does NOT include accented
characters. There are other regex engines where it does, but the Apache ORO
used by CF doesn't. (Unless that's changed with CF9 anyhow, but I suspect
not..)
There are several unnecessary escapes since ".?()" do not need escaping
inside classes.
(However, if '-' wants to be included (it's not currently) then it should be
escaped as '\-' so it's not treated as a range.)
By including \s you're not just saying space, you're *also* including \r and
\n and \t and \v. So either just use a literal space (to avoid tabs) or to
allow tabs don't specify the \r and \n since they're just adding noise.
[^\w\r \n!.?''"()&,;:] or [^\w\s!.?''"()&,;:]
Outside of the character class, the outer group is redundant - regex already
captures the match to \0 so just do:
rereplace( str , '[^\w\r \n!.?''"()&,;:]' , '<span
class="highlight">\0</span>' , 'all' )
And finally, one more optimisation - to avoid a long series of HTML spans,
just add a + to collect multiple characters together:
rereplace( str , '[^\w\r \n!.?''"()&,;:]+' , '<span
class="highlight">\0</span>' , 'all' )
Hope this helps. :)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology-Michael-Dinowitz/dp/1430272155/?tag=houseoffusion
Archive:
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:337883
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm