XOPAwareStAXOMBuilder / MTOMStAXSOAPModelBuilder should use UTF-8 to decode cid: URIs -------------------------------------------------------------------------------------
Key: WSCOMMONS-429 URL: https://issues.apache.org/jira/browse/WSCOMMONS-429 Project: WS-Commons Issue Type: Bug Reporter: Andreas Veithen Assignee: Andreas Veithen Priority: Minor XOPAwareStAXOMBuilder and MTOMStAXSOAPModelBuilder use the document charset encoding to decode cid: URIs (see usage of URLDecoder.decode in ElementHelper#getContentID). However, as explained in [1] (referenced by the definition of the anyURI type), %HH escaping should always be done using UTF-8. Since non ASCII characters are not allowed in content IDs, this is only an issue if the document uses a charset encoding that is not a superset of ASCII (e.g. UTF-16). It should also be noted that most of the characters that require %HH encoding are also not allowed (or are unusual) in content IDs. Therefore this is a minor issue. It should also be noted that the unit test MTOMStAXSOAPModelBuilderTest#testUTF16MTOMMessage specifically tests this incorrect behavior. It should therefore be corrected or removed entirely. [1] http://www.w3.org/TR/2001/WD-charmod-20010126/#sec-URIs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.