Thomas Gentsch created TUSCANY-4075: ---------------------------------------
Summary: SDO C++ handling of XML CDATA and escaped chars inconsistent Key: TUSCANY-4075 URL: https://issues.apache.org/jira/browse/TUSCANY-4075 Project: Tuscany Issue Type: Bug Components: C++ SDO Affects Versions: Cpp-M3 Reporter: Thomas Gentsch we are using both C++ and Java SDO in a project and discovered some misbehavior in the C++ components with XML data converted from/to SDO if the XML contains either escaped chars of CDATA. Java seems to do it mostly right (see below) When looking at the SDO (C++ M3) code and searching on the web (e.g. [1]) it looks as if this topic seemed a bit, well, incomplete in the C++ world. The problem (C++): - loading an XML with CDATA inside works nicely, the CDATA remains intact, therefore saving works nicely too. However, if I do a DataObjectPtr->getCString(), I get the CDATA in the returned value - means as a user I have to deal with that :-| - loading an XML with escaped (e.g. <) works too, libxml2 converts these chars. getCString() returns the real text (e.g. "<"), but saving does not re-insert the escaping - i.e. the resulting XML is not usable anymore (TUSCANY-1553) In Java this looks much better and quite as I'd expect it to: - loading XML with either constructs works - using getCString() just returns the real text with the escaped sections converted - saving works too, CDATA are lost but are rather converted back to escaped XML - this is not the *original* XML anymore but at least it is valid and logically it is the same as the input - Example: Input XML: <tns1:name>ü<>bla blub <![CDATA[ <<>> ]]></tns1:name> getCString() in Java: "ü<>bla blub <<>> " Saving this as XML: <tns1:name>ü<>bla blub <<>> </tns1:name> The only questionable thing is the saved "ü" ... to be converted back to ü or ü ? Anyway, now the question: As it seems there were discussions going on when SDO C++ has been implemented - has the approach above (as in Java) ever been considered and, if so, why has it not been followed? I believe that this would have been also much simpler than it is today: - while parsing - the cdata handler function of the SAX2 handler just appends the text returned by libxml2 - escaped chars are converted by libxml2 anyway - the property value now contains the real text (e.g. "ü<>bla blub <<>> ") and returns it just as-is in getCString() - setting that property also just sets the passed-in value - saving the property just calls libxml2 xmlTextWriterWriteString() which should escape the special chars Another advantage is that users don't need to worry about (un)escaping special chars or CDATA as today. Disadvantage: API behavior changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira