Re: SOLVED Re: Bad character crashing cfxml

S . Isaac Dealey Mon, 10 Apr 2006 14:12:25 -0700

> All, thanks for the help with this issue.  Using CDATA
> didn't work as apparently CF does parse whatever's
> within the CDATA block and was still choking on the
> bad character.  I ultimately implemented the rereplace
> suggested by S. Isaac, and after some tweaking it
> worked. Here's the final code:


> <cfset itemNameClean =
> rereplace(itemName,"[#chr(1)#-#chr(8)#-#chr(11)#-#chr(12)#
> -#chr(14)#-#chr(28)#-#chr(29)#-#chr(31)#-#chr(38)#]","","A
> LL")>
> <title>#itemNameClean#</title>

> I had to add in chr(28) and chr(29) into the rereplace
> as those are the equivalents to Unicode 0x1c and 0x1d
> which were the bad characters that the user had entered
> somehow.  Also added chr(38) which is the '&' character,
> also a baddie.

> -- Josh

Hi Josh,

Without wanting to sound critical, I think you may need a little more
testing before declaring this issue resolved. You seem to have some
extra hyphens in the expression here, that's one issue (you just need
a bit of a primer on regex I think), and another issue is that you're
handling the & character without also handling the > (&gt;) and <
(&lt;) or " (&quot;) characters, which means if a user enters any of
those into the string, they will also cause problems. This was the
reason why my implementation of it had used both XMLFormat() and the
regular expression, because neither one of them independantly solved
the whole problem.

I'll let you decide about the extra special xml characters. :)

As to the regular expression, here's the explanation of where I see
the problem:

The original expression here:

[#chr(1)#-#chr(8)##chr(11)#-#chr(12)##chr(14)#-#chr(31)#]

is similar to

[a-zA-Z0-9]

Notice in this expression that there are two places where there is no
hyphen between two characters, both at "zA" and at "Z0". This is
because of the way the hyphen is interpreted within the class
designated by the [ and ] characters. The class itself tells the
regular expression engine to match any character within the class, so
[ab] will match the letter "a" or the letter "b". The hyphen then
allows you to specify a range of characters (in ASCII or unicode
numeric order), so that [a-b] will match the letter "a" or the letter
"b" or the letter "c". The reason why many people use a-zA-Z instead
of a-Z is because when you look at an ASCII table, there are several
non-alpha characters between the letter "z" and the letter "A" (or
vice versa, I don't remember offhand if ascii has lower-case higher or
lower in the list).

Now, in your expression above, you've added several hyphens, so your
expression is roughly equivalent of

[a-z-A-Z-0-9]

Off hand, since I haven't tested it, I don't know if this will produce
the same result. At a minimum, my expectation would be that it would
add the hyphen to the list of characters being removed, because I've
been able to include hyphens in a character class before, such as
[-0-9]. Since I'm guessing you want to allow users to use hyphens, I'm
thinking you don't want that to happen. On the other hand it could
potentially add other legal characters (9,10,13 and 32-37) to the list
of characters that are removed. You'll have to test it to know exactly
how it behaves.

hth

s. isaac dealey     434.293.6201
new epoch : isn't it time for a change?

add features without fixtures with
the onTap open source framework

http://www.fusiontap.com
http://coldfusion.sys-con.com/author/4806Dealey.htm


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Message: http://www.houseoffusion.com/lists.cfm/link=i:4:237394
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Donations & Support: http://www.houseoffusion.com/tiny.cfm/54

Re: SOLVED Re: Bad character crashing cfxml

Reply via email to