I don't think we can change the behavior of the "default" key. Otherwise, older readers would use the wrong value.
I suggest that we add an optional key, like "default-as-string", that is used to fill in a missing "default" key if there is a reasonable conversion. On write, the write schema would convert to the normal "default" field for backward-compatibility. On read, you can supply only the string default to use that instead of the binary one. I think we could take care of this entirely in the schema parser. rb On Tue, Oct 17, 2017 at 11:53 PM, Bridger Howell <[email protected]> wrote: > I really like the idea of having support for human-readable default values. > > I think I prefer to keep the way defaults are interpreted separate from > logical types, since logical types having are basically optional. I would > be surprised if my language of choice could understand an ISO-8601 > formatted local-date for a field default based on logical type, but I still > had to interface with a numeric value in my code. > > If this doesn't conflict too much with the default value for record fields > (?), I would suggest having an object syntax with a "parser" or "type" > field in addition to the default property. > > A sample record: > { > "type": "record", > "name": "Foo", > "fields": [ > { > "name: "body", > "type": "bytes", > "default": { > "value": "aGVsbG8gd29ybGQ", > "parser": "base64", > "doc": "'hello world' as a base64-encoded string" > } > ] > } > > If changing the "default" property like that has too many issues, I suppose > a parallel "default-parser" property would do the trick too. > > I think this type of approach keeps us neatly separated from logical types, > so that having a parser for a default value doesn't require a logical type, > and maybe makes it clearer which procedure is being performed on the JSON > data to convert it to the base field type. > > -Bridger Howell > > On Tue, Oct 17, 2017 at 9:57 AM, Ryan Blue <[email protected]> > wrote: > > > I think that the parsing canonical form of a schema > > <https://avro.apache.org/docs/1.8.2/spec.html#Parsing+Canoni > > cal+Form+for+Schemas> > > doesn't include the default. I think that makes sense because the > canonical > > form is what's needed to read encoded data. Anyone with more context: is > > that correct? > > > > In my opinion, that makes how we handle defaults a bit more flexible > > because schemas with different defaults are "the same". I'd support > adding > > a new default field that handles values more naturally. We've always had > a > > problem with binary as well and I'd like to see us use base64 encoded > > values instead of the current strategy. > > > > rb > > > > On Tue, Oct 17, 2017 at 8:16 AM, Zoltan Ivanfi <[email protected]> wrote: > > > > > Hi, > > > > > > I would like to start a discussion about making default values and > values > > > in general human-readable for logical types. > > > > > > Currently default values for logical types have to be specified in a > JSON > > > string as the binary representation of the backing primary type (e.g., > > > "\u0000"). Some users intuitively try to specify a human-readable > logical > > > value in this string instead (e.g., "0.00"). This is of course a valid > > byte > > > sequence and as such is accepted, but it results in unexpected > behaviour > > (a > > > different default value than intended). Apart from being error prone, > > > specifying default values this way is also tedious. To keep this e-mail > > > brief, I won't list specific examples here, please see AVRO-2087 > > > <https://issues.apache.org/jira/browse/AVRO-2087> for details instead. > > > > > > The problem of non-human-readable values applies to JSON encoding of > > actual > > > data as well. One reason for using JSON is that it is human readable > and > > > therefore easy to debug. Seeing "\u00018" in a JSON file is not too > > > intuitive and this specific example is actually quite misleading as > well > > > (it can be easily misread as "\u0018"). > > > > > > Introducing a new default value field (called human-readable-default or > > > logical-default for example) would allow easier specification of > default > > > values. (It doesn't solve the problem of accidentally misusing the > > existing > > > field though.) It is, however, not backwards compatible. An older Avro > > > library would ignore the new field and use a different default value. > > > > > > Introducing human-readable values in the JSON encoding is even more > > clearly > > > a breaking change. (Although for JSON we could add the human-readable > > value > > > as a separate extra field that gets ignored when reading. Problem is, > > users > > > may be tempted to change the value and be surprised. It's a pity that > > JSON > > > does not allow comments.) > > > > > > In your opinions, what would be the best way to deal with this problem? > > > > > > Thanks, > > > > > > Zoltan > > > > > > > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > > -- > > > The information contained in this email message is PRIVATE and intended > only for the personal and confidential use of the recipient named above. If > the reader of this message is not the intended recipient or an agent > responsible for delivering it to the intended recipient, you are hereby > notified that you have received this message in error and that any review, > dissemination, distribution or copying of this message is strictly > prohibited. If you have received this communication in error, please > notify us immediately by email, and delete the original message. > -- Ryan Blue Software Engineer Netflix
