Strange question, I'm hoping somebody knows... I just noticed today (or rather, John Ashenfelter pointed out to me :) that some regular expression engines include \uXXXX to match unicode characters.
So I did a couple tests on some data we had in a legacy database (it doesn't have any nvarchar columns... yet) and it turns out that using the \uXXXX pattern in a regular expression will mangle the hell out of an ASCII string... I'm also assuming (and I may be way off base) that using #chr(1-32)# (for a nonprinting ascii character) won't always match the same character in a unicode string. (I woudl think it depends on whether or not the string uses single or double-byte for the individual character, since I recall reading that unicode doesn't always use double-byte representation, but I suspect the regex engine does if you use \uXXXX.) So this brings up a couple of questions: 1) Does the regex engine in CF 6-7 support \uXXXX? 2) am I wrong about #chr(1-32)#? Will it always match the same character in a UTF-8 string? 3) if CF 6+ supports \uXXXX and #chr(1-32)# won't always match the unicode equivalent, is there a way to test a string in CF to determine if it's unicode (digging into Java maybe)? The reason I need this info. is because I also just realized that all the non-printing characters aren't valid in an XML document with the exception of tab (9), newline (10,13) and space (32) characters. This database has some data containing vertical tabs (11) which I'm guessing were pasted from MS Word, and as a result is liable to be a recurring problem, so I need to find a way to strip these characters reliably from a string without mangling the string. s. isaac dealey 434.293.6201 new epoch : isn't it time for a change? add features without fixtures with the onTap open source framework http://www.fusiontap.com http://coldfusion.sys-con.com/author/4806Dealey.htm ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Flash for programmers - Flash MX Pro http://www.houseoffusion.com/banners/view.cfm?bannerid=56 Message: http://www.houseoffusion.com/lists.cfm/link=i:21:944 Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/21 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:21 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.21 Donations & Support: http://www.houseoffusion.com/tiny.cfm/54
