[
https://issues.apache.org/jira/browse/AVRO-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721350#action_12721350
]
Doug Cutting commented on AVRO-36:
----------------------------------
> How do you encode a default value of 0xFFFF - two bytes?
We'd map 1 code point to 1 byte. So the two byte sequence [FF, FF] would be
encoded in JSON as "\u00FF\u00FF".
> Do the strings "\uFFFF" and"\u00FF\u00FF" represent the same binary data?
No. We'd only use code points 0-255. So "\uFFFF" would be illegal.
I'd much prefer we avoid encodings that render text unreadable, since binary
values often include text. So that rules out base64, hex, etc., leaving us
with a choice between URL encoding and the bytes-as-codepoints encoding. URL
encoding is more compact in some cases, but transforms many textual characters,
like turning spaces to pluses. So I am currently leaning towards the codepoint
encoding. It seems the most natural in JSON. In particular, it is the
simplest to implement, since a JSON library is already required to implement
AVRO, and one must merely construct a string whose codepoints are the bytes and
then the JSON library will implement the coding and decoding.
> binary default values do not decode base64
> ------------------------------------------
>
> Key: AVRO-36
> URL: https://issues.apache.org/jira/browse/AVRO-36
> Project: Avro
> Issue Type: Bug
> Components: java
> Reporter: Doug Cutting
> Assignee: Doug Cutting
>
> The specification says that default values for binary data are base64 encoded
> text, but the Java implementation uses the raw bytes of the textual value,
> and does not perform base64 decoded as specified.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.