I didn't know this, so I imagine others might not. The string "�" is
invalid XML. The character is simply not allowed in XML in any
representation. XML 1.0 standard blocks most of the characters under x20,
allowing only x9 xA and xD. XML 1.1 allows x1-x20, but still blocks x0.
http://www.w3.org/TR/1998/REC-xml-19980210#charsets
http://www.w3.org/TR/xml11/#charsets
This creates an interesting problem for serializing Java strings
containing the null character, e.g. "\u0000", or for other non-whitespace
control characters like the bell character "\u0007". We've got an
integration test for this case in Surefire, and it does entirely the wrong
thing (SUREFIRE-455).
In the patch submitted to that bug, Todor throws away nulls in his XML
escaper, silently omitting them from the output; all other control
characters (even the 1.0-illegal ones) pass through. That doesn't seem
right, especially when we're talking about test results! (Expected "" but
was "" ... Just imagine how painful it would be to track something like
that down.)
But neither does it seem right to insert "�" when it's illegal XML.
Notably, Java will cheerfully print � in XML if you tell it to do so,
and many parsers will figure out what to do with it just fine; the same
applies to "".
Thoughts? Should we emit "�", standards-be-damned? Silently omit the
character? Print a "?" instead? Something else?
-Dan
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]