Siddhesh Sundar Toraskar created AXIS-2908:
----------------------------------------------
Summary: Apache Axis fails to handle non Basic Multilingual Plane
characters(U+10000 and above) while creating SOAP request
Key: AXIS-2908
URL: https://issues.apache.org/jira/browse/AXIS-2908
Project: Axis
Issue Type: Bug
Components: Serialization/Deserialization
Affects Versions: 1.4
Environment: OS - CentOS
Software Platform - JDK 7
Reporter: Siddhesh Sundar Toraskar
While creating SOAP request, if we have nonBMP characters(e.g. EMOJIs),
they(EMOJIs) are not properly inserted in XML.
It seems that my content which is UTF-8 will be encoded in UTF-16 Java String
(default) once program receives it.
Apache Axis library that we are using then take those UTF-16 Java Strings and
try to convert back into UTF-8 to create a XML before sending. It fails
whenever I send a 4-byte Emoji (:grin:) UTF-8 character. I found that any UTF-8
4-byte character will be represented as surrogate pair in UTF-16. I suspect in
that case Axis parser not able to understand surrogate pair and not able to
convert into valid UTF-8 encoding.
As result, while UTF-8 is specified, these EMOJIs appear in UTF-16 form which
actually corrupts them because they are then incorrectly processed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]