S. Isaac Dealey wrote:
>>Ok, I've got a little problem here. I'm reading an XML file from a third
>>party and displaying it's content. The problem is that the third party
>>is not checking for illegal characters in the XML file. So things like:
>
>
>><news>He & me are</news>
>
>
>>will show up in the damn thing. So I want to replace the special
>>characters, but only those that are outside of the tags. I've probably
>>got to use a regexp for this, but I'm not sure how to do this. I know I
>>can select part of a sctring with regexp and replce it with a changed
>>version of thet string, but how is that done efficiently, and in one
>>REReplace (I know it can be done, but don't know how).
>
>
>>Anyone?
>
>
>>Jesse
>
>
> Unfortunately, while you can use back-references to return a portion of a
> found regular expression back to the replacement, you can't use any kind of
> functions or conditional logic on these back-references, so you'd have to
> replace each character individually... As for actually getting the illegal
> characters, try something like this:
>
> <cfset illegalchar = REFind(">[^<]*?[^ _-\.[:alnum:]][^<]*?",myxmlpacket)>
>
> This should give you the location of the first illegal character in the
> packet, within the contents of an element, assuming that an illegal
> character is anything other than a space, underscore, hyphen, dot or
> alpha-numeric character... That's probably not a real good definition for
> illegal characters, but it's a starting point. :)
>
> Once you know where that character is, then you can replace it with
> something like <char=#asc(illegalcharacter)#> or whatever the spec. is for
> special characters in your xml dtd. Am I using the terminology correctly?
Ok, I found a solution, it works fine, but could use a bit op
optimization I think. But I first check IF the document is valid, and if
not parse it, so the impact should not be too high, as they usually DO
give a valid XML to parse.
The solution is this:
<cfscript> ct=htmleditformat(cfhttp.filecontent,
-1);
ct=REReplace(ct, "(<[^&>]*)"([^>]*>)", "<\1""\2>", "ALL");
ct=Replace(ct, "<", "<", "ALL");
ct=Replace(ct, ">", ">", "ALL");
newct="";
while (not ct is newct){
newct=ct;
ct=REReplaceNoCase(ct, "(<[^>&]*)"([^>]*>)", "\1""\2", "ALL");
}
ct=REReplace(ct, "&([a-zA-Z]*);", "&\1;", "ALL");
</cfscript>
And it works like a charm :)
Jesse
______________________________________________________________________
Get the mailserver that powers this list at http://www.coolfusion.com
FAQ: http://www.thenetprofits.co.uk/coldfusion/faq
Archives: http://www.mail-archive.com/[email protected]/
Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists