Siddhesh Sundar Toraskar created AXIS-2908: ----------------------------------------------
Summary: Apache Axis fails to handle non Basic Multilingual Plane characters(U+10000 and above) while creating SOAP request Key: AXIS-2908 URL: https://issues.apache.org/jira/browse/AXIS-2908 Project: Axis Issue Type: Bug Components: Serialization/Deserialization Affects Versions: 1.4 Environment: OS - CentOS Software Platform - JDK 7 Reporter: Siddhesh Sundar Toraskar While creating SOAP request, if we have nonBMP characters(e.g. EMOJIs), they(EMOJIs) are not properly inserted in XML. It seems that my content which is UTF-8 will be encoded in UTF-16 Java String (default) once program receives it. Apache Axis library that we are using then take those UTF-16 Java Strings and try to convert back into UTF-8 to create a XML before sending. It fails whenever I send a 4-byte Emoji (:grin:) UTF-8 character. I found that any UTF-8 4-byte character will be represented as surrogate pair in UTF-16. I suspect in that case Axis parser not able to understand surrogate pair and not able to convert into valid UTF-8 encoding. As result, while UTF-8 is specified, these EMOJIs appear in UTF-16 form which actually corrupts them because they are then incorrectly processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@axis.apache.org For additional commands, e-mail: java-dev-h...@axis.apache.org