Eddie Shipman wrote: > READ THE RFC. Anything that is not escaped should be in > a CDATASection, PERIOD. That was the question asked.
Given the following CDATA section: <![CDATA[ ]]> That node contains 10 characters. None of them is a carriage return or line feed. That's the Delphi equivalent of this: data := ' '; A 10-character-long string. But the data we're trying to get is only two characters, a carriage return and a line feed. To get those two characters from the CDATA section, we'd need to take the character data and send it _back_ through an XML interpreter to have it treat that sequence of 10 characters as two numeric character entities. A CDATA section is simply a way to avoid escaping lots of characters that the XML interpreter would otherwise treat specially. The example I gave above is *no different* from the following, which doesn't use a CDATA section: &#13;&#10; The issue is with _encoding_, not _escaping_. The two characters in question have no special meaning in XML, so there's no reason to escape them. There is no way _to_ escape them. To output those characters, the XML serializer uses an identity transformation and puts those two literal characters on the output stream. What's desired, though, is for it to encode those characters differently, instead of encoding them as their literal values. Suppose the output encoding is US-ASCII. The internal representation is of course Unicode. The serializer will normally write a carriage return as the one-byte hexadecimal sequence 0x0D when it needs to output that character. It has no reason to encode it any differently because a carriage return is a perfectly valid US-ASCII character and does not interfere with XML syntax. If the character to output were U+2014, the em dash, then the serializer would have to output the seven-byte character sequence "—" instead. The em dash is not a valid US-ASCII character, so the serializer needs to _encode_ that character some other way. There's nothing to escape, though, since the em dash is not special in XML syntax. The serializer could _not_ use a CDATA section to contain the em dash because there is no way to represent that character in US-ASCII without using special XML characters to _encode_ it. If the character to output were U+0026, the ampersand, then the serializer would output the five-byte character sequence "&" in its place. The ampersand is a valid US-ASCII character, but since it also has special meaning in XML, the serializer needs to use some other way of encoding that character instead of using the literal value. If the serializer instead chose to use a CDATA section for that character, it could output the 13-byte character sequence "<![CDATA[&]]>", but that would be wasteful. -- Rob _______________________________________________ Delphi mailing list -> [email protected] http://www.elists.org/mailman/listinfo/delphi

