[ 
https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056779#comment-15056779
 ] 

Doug Cutting commented on AVRO-1584:
------------------------------------

Ryan, I agree this is a bug in the current implementation.  According to 
section RFC 4627, control characters must be escaped.
bq. All Unicode characters may be placed within the quotation marks except for 
the characters that must be escaped: quotation mark, reverse solidus, and the 
control characters (U+0000 through U+001F).
I note that this was fixed for strings in AVRO-713 and we can probably share 
this logic.

The difference between toString() JSON and Avro's JSON data encoding is 
longstanding and primarily around the encoding of unions.  For full read/write 
fidelity, many union values must be tagged with their type, so that's what the 
JSON encoding requires.  The toString() encoding was not intended for data 
fidelity but for debugging, so a simpler version was implemented.  (It actually 
pre-dates the specification of the JSON encoding.)  It so happens that default 
values in schemas do not need to be tagged, so the toString() format is 
identical to the default-value format.

However there are frequent requests for a reader that accepts such an untagged 
format, for interaction with other JSON-generating software.  In retrospect, 
the JSON encoding should perhaps not require tagging for unions with null or 
unions between a primitive and a non-primitive, i.e., only tag unions when it's 
required.  We instead opted for simplicity of specification implementation, to 
ease interoperability between various Avro implementations, when perhaps in 
this case we should have optimized for ease of interoperability with non-Avro 
producers and consumers of JSON.

So long-term we might add an encoder/decoder that doesn't handle unions at all 
or that handles them more parsimoniously, then perhaps implement default values 
and toString() using this encoding.  But I don't think we should alter the 
currently specified JSON encoding, nor change the default or toString() format.



> Json output doesn't generate base64 for byte arrays
> ---------------------------------------------------
>
>                 Key: AVRO-1584
>                 URL: https://issues.apache.org/jira/browse/AVRO-1584
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.7
>         Environment: Pure java.
>            Reporter: Christophe Lorenz
>         Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, 
> AVRO-1584.patch
>
>
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema : 
> {"namespace": "example.avro",
>  "type": "record",
>  "name": "ByteArrayEncoding",
>  "fields": [     {"name": "data", "type": "bytes"} ]
> }
> The toString()  
>       System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new 
> byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": "  ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back 
> and forth to Base64 like other Json implementations : 
> {"data": {"bytes": "AB9BQkP/tg=="}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to