Dawid Weiss wrote:
We should not drop the offending characters, but escape them. Either
the Unicode entity (&#nn;) or CDATA way is ok (and CDATA way is simpler).
This isn't entirely true, Andrzej -- escaping a character, or putting it
in a CDATA section is just about different ways of expressing the same
character code in an XML structure. The same and ILLEGAL character code
in terms of XML spec (there is a fragment specifying legal character
ranges there), so a conforming XML parser should throw an exception if
it encounters anything outside of the legal range. The only way of
transferring a full binary is to encode it to legal unicode characters
(using uuencode or such).
I agree with the person who submitted this patch that it is a potential
issue and should be addressed somehow.
Right, I didn't think about this... somehow I thought this was all about
special characters like ' " & <.
Then we should take the best of both worlds - escape valid characters,
and replace invalid ones with '?' or space, or nothing. I know a place
where we could find some inspiration (Carrot2 XMLSerializerHelper.java
... ;-) )
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com