[ 
https://issues.apache.org/jira/browse/AVRO-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833146#comment-17833146
 ] 

Thiruvalluvan M. G. commented on AVRO-3860:
-------------------------------------------

It is a bit more complicated. According to RFC4627, unicode values between 
{{0x10000}} and {{0x10ffff}} are to be encoded as two unicode sequences (each 
representing a UTF-16 character). See section 2.5 of 
[RFC4627|[https://www.ietf.org/rfc/rfc4627.txt].] Any value beyond 0x110000 are 
not allowed. The fix is https://github.com/apache/avro/pull/2831

> C++ json fails to handle unicode > U+ FFFF
> ------------------------------------------
>
>                 Key: AVRO-3860
>                 URL: https://issues.apache.org/jira/browse/AVRO-3860
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: c++
>    Affects Versions: 1.11.2
>            Reporter: Pietro Cerutti
>            Assignee: Thiruvalluvan M. G.
>            Priority: Major
>
> As a follow up of AVRO-1190, would it be possible to fix code points above 
> U+FFFF?
> I think a reasonable test case would be to add this line 
> [here:|https://github.com/apache/avro/blob/315f28d636c87eace9a6d6310de78710e1d1f85a/lang/c%2B%2B/test/JsonTests.cc#L70]
> {code:java}
> {R"("\U00010000")", EntityType::String, "\xF0\x90\x80\x80", 
> R"("\U00010000")"}, {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to