Ian Maxon created ASTERIXDB-2849:
------------------------------------
Summary: Msgpack to ADM deserialization breaks for 3 and 4 byte
UTF-8 characters
Key: ASTERIXDB-2849
URL: https://issues.apache.org/jira/browse/ASTERIXDB-2849
Project: Apache AsterixDB
Issue Type: Bug
Reporter: Ian Maxon
Assignee: Ian Maxon
Right now the deserialization of strings from msgpack to ADM does not properly
re-encode the string from UTF-8 to Modified UTF-8 as used in ADM records.
Therefore things like 4-byte characters that need to be encoded as a surrogate
pair of 3-byte characters break entirely and the string is truncated.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)