RE: Cleaning XML - Unicode 0x0 SOLVED sorta

Matt Quackenbush Tue, 07 Nov 2006 12:18:20 -0800

Josh,

I think the point that Rob and others were making is that your data should
be validated and cleaned up BEFORE being inserted into the database -
whether it's inserted as XML or not is completely and utterly irrelevant.
If you didn't have invalid data in the database, then you wouldn't have
invalid data in your XML.  But, since the data obviously is NOT being
validated and cleaned up before db entry, the best, most scalable, and most
widely accepted "good practice" would be to use CDATA in your XML.

Again though, what you're doing is just a bandaid that covers up the real
issue, which is invalid data being entered into the database.

Thanks,

Matt

-----Original Message-----
From: Josh Nathanson [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 07, 2006 1:14 PM
To: CF-Talk
Subject: Re: Cleaning XML - Unicode 0x0 SOLVED sorta

OK, I added this to my regex:

\x00

Which is a hex representation of the character 0.  And it worked.

Not sure why chr(0) didn't work.

Yes it's non scalable...but, since the data is not going into the database
as xml, just plain old form fields, I can't use CDATA on the way in anyway,
correct?  I would have to run the same regex on each of the incoming form
fields that are text...so, this way is more scalable than that I guess.

-- Josh

----- Original Message -----
From: "Rob Wilkerson" <[EMAIL PROTECTED]>
To: "CF-Talk" <[email protected]>
Sent: Tuesday, November 07, 2006 10:19 AM
Subject: Re: Cleaning XML - Unicode 0x0

> On 11/7/06, Josh Nathanson <[EMAIL PROTECTED]> wrote:
>> Thanks for your help Rob.  I just don't know which field is the culprit 
>> as
>> far as the null character (there's no description field or anything 
>> obvious
>> like that), and I'm hesitant to CDATA every single field that's going 
>> into
>> the db, unless I've exhausted every possible other option.
>
> I wouldn't apply a CDATA block to every field indiscriminately, but I
> would apply it to varchar and text fields where the data is likely to
> be quite variable.
>
>> I'll keep grinding on trying to regex the null character out of there and
>> let the list know if I figure anything out.
>
> The problem with this approach is that while it's currently the null
> character, next time it might be something else and then something
> else.  Your regex could just continue to grow.  I guess what I'm
> saying is that it's not really a scalable solution.
>
> Handling invalid character in a batch manner by including them in a
> CDATA block or by understanding how those characters are being
> inserted is a more workable long term solution.
>
> That said, adding this final character may turn out to be the last you
> ever hear of this particular problem.  :-)
>
> 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:259505
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4

RE: Cleaning XML - Unicode 0x0 SOLVED sorta

Reply via email to