[
https://issues.apache.org/jira/browse/AVRO-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833146#comment-17833146
]
Thiruvalluvan M. G. commented on AVRO-3860:
-------------------------------------------
It is a bit more complicated. According to RFC4627, unicode values between
{{0x10000}} and {{0x10ffff}} are to be encoded as two unicode sequences (each
representing a UTF-16 character). See section 2.5 of
[RFC4627|[https://www.ietf.org/rfc/rfc4627.txt].] Any value beyond 0x110000 are
not allowed. The fix is https://github.com/apache/avro/pull/2831
> C++ json fails to handle unicode > U+ FFFF
> ------------------------------------------
>
> Key: AVRO-3860
> URL: https://issues.apache.org/jira/browse/AVRO-3860
> Project: Apache Avro
> Issue Type: Bug
> Components: c++
> Affects Versions: 1.11.2
> Reporter: Pietro Cerutti
> Assignee: Thiruvalluvan M. G.
> Priority: Major
>
> As a follow up of AVRO-1190, would it be possible to fix code points above
> U+FFFF?
> I think a reasonable test case would be to add this line
> [here:|https://github.com/apache/avro/blob/315f28d636c87eace9a6d6310de78710e1d1f85a/lang/c%2B%2B/test/JsonTests.cc#L70]
> {code:java}
> {R"("\U00010000")", EntityType::String, "\xF0\x90\x80\x80",
> R"("\U00010000")"}, {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)