Strange question, I'm hoping somebody knows...

I just noticed today (or rather, John Ashenfelter pointed out to me :)
that some regular expression engines include \uXXXX to match unicode
characters.

So I did a couple tests on some data we had in a legacy database (it
doesn't have any nvarchar columns... yet) and it turns out that using
the \uXXXX pattern in a regular expression will mangle the hell out of
an ASCII string... I'm also assuming (and I may be way off base) that
using #chr(1-32)# (for a nonprinting ascii character) won't always
match the same character in a unicode string. (I woudl think it
depends on whether or not the string uses single or double-byte for
the individual character, since I recall reading that unicode doesn't
always use double-byte representation, but I suspect the regex engine
does if you use \uXXXX.)

So this brings up a couple of questions:

1) Does the regex engine in CF 6-7 support \uXXXX?

2) am I wrong about #chr(1-32)#? Will it always match the same
character in a UTF-8 string?

3) if CF 6+ supports \uXXXX and #chr(1-32)# won't always match the
unicode equivalent, is there a way to test a string in CF to determine
if it's unicode (digging into Java maybe)?


The reason I need this info. is because I also just realized that all
the non-printing characters aren't valid in an XML document with the
exception of tab (9), newline (10,13) and space (32) characters. This
database has some data containing vertical tabs (11) which I'm
guessing were pasted from MS Word, and as a result is liable to be a
recurring problem, so I need to find a way to strip these characters
reliably from a string without mangling the string.


s. isaac dealey     434.293.6201
new epoch : isn't it time for a change?

add features without fixtures with
the onTap open source framework

http://www.fusiontap.com
http://coldfusion.sys-con.com/author/4806Dealey.htm


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Flash for programmers - Flash MX Pro
http://www.houseoffusion.com/banners/view.cfm?bannerid=56

Message: http://www.houseoffusion.com/lists.cfm/link=i:21:944
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/21
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:21
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.21
Donations & Support: http://www.houseoffusion.com/tiny.cfm/54

Reply via email to