Ian Maxon created ASTERIXDB-2849:
------------------------------------

             Summary: Msgpack to ADM deserialization breaks for 3 and 4 byte 
UTF-8 characters
                 Key: ASTERIXDB-2849
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2849
             Project: Apache AsterixDB
          Issue Type: Bug
            Reporter: Ian Maxon
            Assignee: Ian Maxon


Right now the deserialization of strings from msgpack to ADM does not properly 
re-encode the string from UTF-8 to Modified UTF-8 as used in ADM records. 
Therefore things like 4-byte characters that need to be encoded as a surrogate 
pair of 3-byte characters break entirely and the string is truncated. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to