https://bz.apache.org/bugzilla/show_bug.cgi?id=57008
--- Comment #18 from Greg Woolsey <[email protected]> --- I always go back to the standards doc when I get going around in circles. Here's what it says about escaped strings: 22.4.2.4 bstr (Basic String) This element defines a binary basic string variant type, which can store any valid Unicode character. Unicode characters that cannot be directly represented in XML as defined by the XML 1.0 specification, shall be escaped using the Unicode numerical character representation escape character format _xHHHH_, where H represents a hexadecimal character in the character's value. [Example: The Unicode character 8 is not permitted in an XML 1.0 document, so it shall be escaped as _x0008_. end example] To store the literal form of an escape sequence, the initial underscore shall itself be escaped (i.e. stored as _x005F_). [Example: The string literal _x0008_ would be stored as _x005F_x0008_. end example] The possible values for this element are defined by the W3C XML Schema string datatype. I think POI should assume it needs to escape Unicode when setting CT* class value strings, and unescape when reading them. I don't think POI should be attempting to unescape them at any other time than when reading a string value from a CT* class. -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
