[
https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056779#comment-15056779
]
Doug Cutting commented on AVRO-1584:
------------------------------------
Ryan, I agree this is a bug in the current implementation. According to
section RFC 4627, control characters must be escaped.
bq. All Unicode characters may be placed within the quotation marks except for
the characters that must be escaped: quotation mark, reverse solidus, and the
control characters (U+0000 through U+001F).
I note that this was fixed for strings in AVRO-713 and we can probably share
this logic.
The difference between toString() JSON and Avro's JSON data encoding is
longstanding and primarily around the encoding of unions. For full read/write
fidelity, many union values must be tagged with their type, so that's what the
JSON encoding requires. The toString() encoding was not intended for data
fidelity but for debugging, so a simpler version was implemented. (It actually
pre-dates the specification of the JSON encoding.) It so happens that default
values in schemas do not need to be tagged, so the toString() format is
identical to the default-value format.
However there are frequent requests for a reader that accepts such an untagged
format, for interaction with other JSON-generating software. In retrospect,
the JSON encoding should perhaps not require tagging for unions with null or
unions between a primitive and a non-primitive, i.e., only tag unions when it's
required. We instead opted for simplicity of specification implementation, to
ease interoperability between various Avro implementations, when perhaps in
this case we should have optimized for ease of interoperability with non-Avro
producers and consumers of JSON.
So long-term we might add an encoder/decoder that doesn't handle unions at all
or that handles them more parsimoniously, then perhaps implement default values
and toString() using this encoding. But I don't think we should alter the
currently specified JSON encoding, nor change the default or toString() format.
> Json output doesn't generate base64 for byte arrays
> ---------------------------------------------------
>
> Key: AVRO-1584
> URL: https://issues.apache.org/jira/browse/AVRO-1584
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.7.7
> Environment: Pure java.
> Reporter: Christophe Lorenz
> Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch,
> AVRO-1584.patch
>
>
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema :
> {"namespace": "example.avro",
> "type": "record",
> "name": "ByteArrayEncoding",
> "fields": [ {"name": "data", "type": "bytes"} ]
> }
> The toString()
> System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new
> byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": " ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back
> and forth to Base64 like other Json implementations :
> {"data": {"bytes": "AB9BQkP/tg=="}}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)