[ http://issues.apache.org/jira/browse/AXIS-2025?page=comments#action_12314564 ]
Shankar Unni commented on AXIS-2025: ------------------------------------ > have them be replaced by entity escapes... Hmm, perhaps not. But the larger problem still stands - every other transport mechanism handles arbitrary characters in strings. Do the standards simply *punt* on some subset of strings? I see in the SOAP document (3.1.2, Encoding simple values) that *it* basically shrugs and says at this point: "Note that certain Unicode characters cannot be represented in XML". So perhaps the true answer is not to use xsd:string for such strings, but encode them as binary somehow? The problem is that there is little or no guidance for users in this matter. As an aside: I'm also looking at http://www.xmlrpc.com/spec#update1, where I see the interesting lines: "What characters are allowed in strings? Non-printable characters? Null characters? Can a "string" be used to hold an arbitrary chunk of binary data? Any characters are allowed in a string except < and &, which are encoded as < and &. A string can be used to encode binary data." There's got to be *some* way to pass such strings around.. Most applications don't have full control of how such strings are created anyway, and this is an almost intolerable restriction.. > Illegal XML characters in String arguments and return values cause XML > exceptions in Axis calls > ----------------------------------------------------------------------------------------------- > > Key: AXIS-2025 > URL: http://issues.apache.org/jira/browse/AXIS-2025 > Project: Apache Axis > Type: Bug > Components: Serialization/Deserialization > Versions: 1.2 > Environment: All (but reproduced on WinXP). > Axis 1.1 and 1.2 > Reporter: Shankar Unni > Assignee: Venkat Reddy > Attachments: Axis1.1badmsgAPI.log, Axis1.1echoAPI.log, Axis1.2badmsgAPI.log, > Axis1.2echoAPI.log > > Arguments and return values of Java type String are incorrectly handled if > they contain non-printing illegal ASCII characters. > Example 1: bad return values: > - - - - - - - - - - - - - - - > E.g. the string > "bad char: " + (char)3 + "." > Trivial example: > foo.jws: > public class foo { > public String badmsg() > { > return "bad: " + (char)3 + "."; > } > } > When calling this method and the server is running on Axis 1.1, it returns > XML with the illegal character ASCII "3" in the text: > <badmsgReturn xsi:type="xsd:string">bad: ?.</badmsgReturn> > This causes an XML parse exception on the client side > ("org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x3) was > found in the element content of the document.") > With Axis 1.2, the server doesn't even return a valid response: I get an HTTP > 200 OK with an empty content, causing a different XML parse error. > Example 2: bad parameter values: > - - - - - - - - - - - - - - - - > A similar problem exists when passing such a string from the the client side. > If I have a method in foo.jws: > public class foo { > public String echo(String s) > { > return s; > } > } > Then if I write an ordinary Java client to call this, and pass it a bad > string as in the beginning of this post, I get an exception thrown while the > call is being composed: > java.lang.IllegalArgumentException: The char '0x3' in 'bad char: ?.' is not a > valid XML character. > This is somewhat absurd: shouldn't the serialization layer be encoding these > illegal XML characters as entity escapes? They're entirely legal in the > current locale (US), and normal Java code handles this character quite > normally. Why should it croak when passed by XML/RPC? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
