Hi,

I would like to start a discussion about making default values and values
in general human-readable for logical types.

Currently default values for logical types have to be specified in a JSON
string as the binary representation of the backing primary type (e.g.,
"\u0000"). Some users intuitively try to specify a human-readable logical
value in this string instead (e.g., "0.00"). This is of course a valid byte
sequence and as such is accepted, but it results in unexpected behaviour (a
different default value than intended). Apart from being error prone,
specifying default values this way is also tedious. To keep this e-mail
brief, I won't list specific examples here, please see AVRO-2087
<https://issues.apache.org/jira/browse/AVRO-2087> for details instead.

The problem of non-human-readable values applies to JSON encoding of actual
data as well. One reason for using JSON is that it is human readable and
therefore easy to debug. Seeing "\u00018" in a JSON file is not too
intuitive and this specific example is actually quite misleading as well
(it can be easily misread as "\u0018").

Introducing a new default value field (called human-readable-default or
logical-default for example) would allow easier specification of default
values. (It doesn't solve the problem of accidentally misusing the existing
field though.) It is, however, not backwards compatible. An older Avro
library would ignore the new field and use a different default value.

Introducing human-readable values in the JSON encoding is even more clearly
a breaking change. (Although for JSON we could add the human-readable value
as a separate extra field that gets ignored when reading. Problem is, users
may be tempted to change the value and be surprised. It's a pity that JSON
does not allow comments.)

In your opinions, what would be the best way to deal with this problem?

Thanks,

Zoltan

Reply via email to