[ 
https://issues.apache.org/jira/browse/AVRO-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715590#action_12715590
 ] 

Doug Cutting commented on AVRO-36:
----------------------------------

> I like the spec the way it is i.e. length + actual bytes 

The question is not how to encode binary values in Avro, but rather, how to 
encode default values for binary fields in JSON-based schemas, which has no 
support for binary values but only UTF-8 strings.

It is possible to encode arbitrary binary values in UTF-8, by encoding each 
byte as a code point.  The number of bytes encoded will differ than the raw 
binary, as bytes between 128 and 255 must be encoded as two bytes.  This has 
the advantage of rendering ASCII portions of binary data in a readable manner, 
but, in pathological cases, it can double data size.  Base64 is more opaque, 
but guarantees data size at 1.5 times the number of bytes.

For default values I'm not worried about the size, but base64 is a more 
standard way of encoding binary values in text than perverting unicode.  In 
particular, base64 is designed to survive email and text editors, which makes 
it easier to process as source code, as schemas will sometimes be.

Ideally we'd use an encoding that was both text-editor/email friendly and 
transparent.  URL encoding might thus be a better choice than base64 or raw 
UTF-8.  It's also readily available on most platforms.  How would folks feel 
about using URL encoding for default values of binary fields in JSON schemas?


> binary default values do not decode base64
> ------------------------------------------
>
>                 Key: AVRO-36
>                 URL: https://issues.apache.org/jira/browse/AVRO-36
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>
> The specification says that default values for binary data are base64 encoded 
> text, but the Java implementation uses the raw bytes of the textual value, 
> and does not perform base64 decoded as specified.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to