I think that the parsing canonical form of a schema <https://avro.apache.org/docs/1.8.2/spec.html#Parsing+Canonical+Form+for+Schemas> doesn't include the default. I think that makes sense because the canonical form is what's needed to read encoded data. Anyone with more context: is that correct?
In my opinion, that makes how we handle defaults a bit more flexible because schemas with different defaults are "the same". I'd support adding a new default field that handles values more naturally. We've always had a problem with binary as well and I'd like to see us use base64 encoded values instead of the current strategy. rb On Tue, Oct 17, 2017 at 8:16 AM, Zoltan Ivanfi <[email protected]> wrote: > Hi, > > I would like to start a discussion about making default values and values > in general human-readable for logical types. > > Currently default values for logical types have to be specified in a JSON > string as the binary representation of the backing primary type (e.g., > "\u0000"). Some users intuitively try to specify a human-readable logical > value in this string instead (e.g., "0.00"). This is of course a valid byte > sequence and as such is accepted, but it results in unexpected behaviour (a > different default value than intended). Apart from being error prone, > specifying default values this way is also tedious. To keep this e-mail > brief, I won't list specific examples here, please see AVRO-2087 > <https://issues.apache.org/jira/browse/AVRO-2087> for details instead. > > The problem of non-human-readable values applies to JSON encoding of actual > data as well. One reason for using JSON is that it is human readable and > therefore easy to debug. Seeing "\u00018" in a JSON file is not too > intuitive and this specific example is actually quite misleading as well > (it can be easily misread as "\u0018"). > > Introducing a new default value field (called human-readable-default or > logical-default for example) would allow easier specification of default > values. (It doesn't solve the problem of accidentally misusing the existing > field though.) It is, however, not backwards compatible. An older Avro > library would ignore the new field and use a different default value. > > Introducing human-readable values in the JSON encoding is even more clearly > a breaking change. (Although for JSON we could add the human-readable value > as a separate extra field that gets ignored when reading. Problem is, users > may be tempted to change the value and be surprised. It's a pity that JSON > does not allow comments.) > > In your opinions, what would be the best way to deal with this problem? > > Thanks, > > Zoltan > -- Ryan Blue Software Engineer Netflix
