On Jul 26, 2006, at 3:19 AM, Bill de hÓra wrote:
A. Pagaltzis wrote:
* Robert Sayre <[EMAIL PROTECTED]> [2006-07-26 01:45]:
On 7/25/06, Bill de hÓra <[EMAIL PROTECTED]> wrote:
And I didn't know whether Atom code could get away with
escaping < and &.
<atom:title type="html"><b> hmm<b></atom:title>
that is an XML fatal error, no doubt, as the ampersand before
"nbsp" must be escaped.
But he did say “escaping < and &”, so it would be. I’m not sure
what Bill’s question even is.
What do I escape, so I know what to unescape?
The point is that after your XML parser has unescaped the content of
the element, it should be suitable for handling as HTML. Escape
whatever you have to ensure that the consumer gets HTML from their
XML parser. Converting & to & and < to < is sufficient
(assuming that you've started with HTML--if you've started with plain
text, then you need to double escape, but in that case, you should be
using type="text" anyway to save yourself the trouble). You could
also convert > to >, " to ", ' to ' and any other
characters to numeric character references. Or you put the whole
thing in a CDATA block.