[
https://issues.apache.org/jira/browse/IGNITE-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15286334#comment-15286334
]
Igor Sapego commented on IGNITE-3140:
-------------------------------------
Denis,
According to [wikipedia|https://en.wikipedia.org/wiki/UTF-8#Description], code
points between {{U+0800}} and {{U+FFFF}} are serialized using 3 bytes in UTF-8,
so everything seems to be according to specification in our case. Though these
code points themselves may be considered invalid by some of the
implementations, encoding is still valid.
C++ standard itself does not specify string encoding in any way and does not
include functions to operate encodings so there is no such thing as
serialization in encoding sense on C++ side. It means that if you put something
(no matter what) in C++ string it is going to be operable as C++ standard does
not specify string encoding. In C++ string is just a sequence of characters of
a specified size. So I simply can't serialize UTF-16 string on the C++ side
unless I write serialization algorithm by myself or if I'm not going to use
some third party implementation.
> C++: UTF-16 surrogate symbols are not serialized properly
> ---------------------------------------------------------
>
> Key: IGNITE-3140
> URL: https://issues.apache.org/jira/browse/IGNITE-3140
> Project: Ignite
> Issue Type: Bug
> Components: platforms
> Affects Versions: 1.5.0.final
> Reporter: Denis Magda
> Assignee: Vladimir Ozerov
> Fix For: 1.6
>
>
> There is an issue with serialization of a surrogate symbol with
> {{BinaryMarshaller}}. On Java side String's serialization logic was improved
> to support all the cases. Refer to IGNITE-3098.
> C++ serialization logic has to be updated as well. Please refer to the
> algorithm located in ignite-3098 branch in the following places:
> - {{BinaryUtils.utf8BytesToStr}} - serialization
> - {{BinaryUtils.strToUtf8Bytes}} - deserialization
> -
> {{IgniteSystemProperties.IGNITE_BINARY_MARSHALLER_USE_STRING_SERIALIZATION_VER_2}}
> controls which version of serialization logic to use (old or new).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)