[ 
https://issues.apache.org/jira/browse/IGNITE-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15286334#comment-15286334
 ] 

Igor Sapego commented on IGNITE-3140:
-------------------------------------

Denis,

According to [wikipedia|https://en.wikipedia.org/wiki/UTF-8#Description], code 
points between {{U+0800}} and {{U+FFFF}} are serialized using 3 bytes in UTF-8, 
so everything seems to be according to specification in our case. Though these 
code points themselves may be considered invalid by some of the 
implementations, encoding is still valid.

C++ standard itself does not specify string encoding in any way and does not 
include functions to operate encodings so there is no such thing as 
serialization in encoding sense on C++ side. It means that if you put something 
(no matter what) in C++ string it is going to be operable as C++ standard does 
not specify string encoding. In C++ string is just a sequence of characters of 
a specified size. So I simply can't serialize UTF-16 string on the C++ side 
unless I write serialization algorithm by myself or if I'm not going to use 
some third party implementation.

> C++: UTF-16 surrogate symbols are not serialized properly
> ---------------------------------------------------------
>
>                 Key: IGNITE-3140
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3140
>             Project: Ignite
>          Issue Type: Bug
>          Components: platforms
>    Affects Versions: 1.5.0.final
>            Reporter: Denis Magda
>            Assignee: Vladimir Ozerov
>             Fix For: 1.6
>
>
> There is an issue with serialization of a surrogate symbol with 
> {{BinaryMarshaller}}. On Java side String's serialization logic was improved 
> to support all the cases. Refer to IGNITE-3098.
> C++ serialization logic has to be updated as well. Please refer to the 
> algorithm located in ignite-3098 branch in the following places:
> - {{BinaryUtils.utf8BytesToStr}} - serialization
> - {{BinaryUtils.strToUtf8Bytes}} - deserialization
> - 
> {{IgniteSystemProperties.IGNITE_BINARY_MARSHALLER_USE_STRING_SERIALIZATION_VER_2}}
>  controls which version of serialization logic to use (old or new).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to