[
https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056532#comment-15056532
]
Doug Cutting commented on AVRO-1584:
------------------------------------
The problem you originally cite (question marks in output) is caused by using a
non-UTF8 encoding when printing the value of toString(), not with that value
itself. So there's not actually a bug here. The string produced by toString()
loses no information. Rather, you seek either a (incompatible) change or a new
feature.
Changing the format of toString() for binary values incompatibly to base64
seems likely to break applications, e.g. those that that use toString() to
supply default values to the schema builder API. I question that this is of
sufficient benefit to be worth doing even in a release that permits
incompatibilities. There is no perfect string format for binary values. The
one currently used here (and by the spec for default values) makes textual
values more legible, while base64 makes non-textual values more tolerant of
non-UTF8-safe i/o.
Perhaps we should instead add a flag that one can set to change
GenericData#toString() so that it generates base64? We should also certainly
add some tests for the current format if there are none.
> Json output doesn't generate base64 for byte arrays
> ---------------------------------------------------
>
> Key: AVRO-1584
> URL: https://issues.apache.org/jira/browse/AVRO-1584
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.7.7
> Environment: Pure java.
> Reporter: Christophe Lorenz
> Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch,
> AVRO-1584.patch
>
>
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema :
> {"namespace": "example.avro",
> "type": "record",
> "name": "ByteArrayEncoding",
> "fields": [ {"name": "data", "type": "bytes"} ]
> }
> The toString()
> System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new
> byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": " ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back
> and forth to Base64 like other Json implementations :
> {"data": {"bytes": "AB9BQkP/tg=="}}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)