Andreas Krantz created XERCESC-2130:
---------------------------------------
Summary: UTF16 Surrgate values 0xD800-0xDFFF can not longer be
written with xerces 3.2.0
Key: XERCESC-2130
URL: https://issues.apache.org/jira/browse/XERCESC-2130
Project: Xerces-C++
Issue Type: Bug
Components: DOM
Affects Versions: 3.2.0
Reporter: Andreas Krantz
Priority: Critical
Attachments: reproduce.cpp
Solution for XERCESC-1854 introduced method
{{DOMLSSerializerImpl::ensureValidString}}
which has an error in validation.
The method validates XMLCh which represent UTF16.
[Valid Characters|https://www.w3.org/TR/REC-xml/#NT-Char] #x9 | #xA | #xD |
[#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
are the valid UTF32 characters.
The UTF16 surrogate range from xD800 - xDFFF is used to represent
[#x10000-#x10FFFF] and should not be handled as nvalid.
*The reader threads this correctly and does not complain, which leads to an
asmetric behavior*
Reading DOM => OK
Save back DOM => Exception
I tried to attach an example to show the behavior.
The used methods
{{bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)}}
already have a second optional parameter to check surrogate values.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]