I really like the idea of having support for human-readable default values.
I think I prefer to keep the way defaults are interpreted separate from
logical types, since logical types having are basically optional. I would
be surprised if my language of choice could understand an ISO-8601
formatted local-date for a field default based on logical type, but I still
had to interface with a numeric value in my code.
If this doesn't conflict too much with the default value for record fields
(?), I would suggest having an object syntax with a "parser" or "type"
field in addition to the default property.
A sample record:
{
"type": "record",
"name": "Foo",
"fields": [
{
"name: "body",
"type": "bytes",
"default": {
"value": "aGVsbG8gd29ybGQ",
"parser": "base64",
"doc": "'hello world' as a base64-encoded string"
}
]
}
If changing the "default" property like that has too many issues, I suppose
a parallel "default-parser" property would do the trick too.
I think this type of approach keeps us neatly separated from logical types,
so that having a parser for a default value doesn't require a logical type,
and maybe makes it clearer which procedure is being performed on the JSON
data to convert it to the base field type.
-Bridger Howell
On Tue, Oct 17, 2017 at 9:57 AM, Ryan Blue <[email protected]>
wrote:
> I think that the parsing canonical form of a schema
> <https://avro.apache.org/docs/1.8.2/spec.html#Parsing+Canoni
> cal+Form+for+Schemas>
> doesn't include the default. I think that makes sense because the canonical
> form is what's needed to read encoded data. Anyone with more context: is
> that correct?
>
> In my opinion, that makes how we handle defaults a bit more flexible
> because schemas with different defaults are "the same". I'd support adding
> a new default field that handles values more naturally. We've always had a
> problem with binary as well and I'd like to see us use base64 encoded
> values instead of the current strategy.
>
> rb
>
> On Tue, Oct 17, 2017 at 8:16 AM, Zoltan Ivanfi <[email protected]> wrote:
>
> > Hi,
> >
> > I would like to start a discussion about making default values and values
> > in general human-readable for logical types.
> >
> > Currently default values for logical types have to be specified in a JSON
> > string as the binary representation of the backing primary type (e.g.,
> > "\u0000"). Some users intuitively try to specify a human-readable logical
> > value in this string instead (e.g., "0.00"). This is of course a valid
> byte
> > sequence and as such is accepted, but it results in unexpected behaviour
> (a
> > different default value than intended). Apart from being error prone,
> > specifying default values this way is also tedious. To keep this e-mail
> > brief, I won't list specific examples here, please see AVRO-2087
> > <https://issues.apache.org/jira/browse/AVRO-2087> for details instead.
> >
> > The problem of non-human-readable values applies to JSON encoding of
> actual
> > data as well. One reason for using JSON is that it is human readable and
> > therefore easy to debug. Seeing "\u00018" in a JSON file is not too
> > intuitive and this specific example is actually quite misleading as well
> > (it can be easily misread as "\u0018").
> >
> > Introducing a new default value field (called human-readable-default or
> > logical-default for example) would allow easier specification of default
> > values. (It doesn't solve the problem of accidentally misusing the
> existing
> > field though.) It is, however, not backwards compatible. An older Avro
> > library would ignore the new field and use a different default value.
> >
> > Introducing human-readable values in the JSON encoding is even more
> clearly
> > a breaking change. (Although for JSON we could add the human-readable
> value
> > as a separate extra field that gets ignored when reading. Problem is,
> users
> > may be tempted to change the value and be surprised. It's a pity that
> JSON
> > does not allow comments.)
> >
> > In your opinions, what would be the best way to deal with this problem?
> >
> > Thanks,
> >
> > Zoltan
> >
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
--
The information contained in this email message is PRIVATE and intended
only for the personal and confidential use of the recipient named above. If
the reader of this message is not the intended recipient or an agent
responsible for delivering it to the intended recipient, you are hereby
notified that you have received this message in error and that any review,
dissemination, distribution or copying of this message is strictly
prohibited. If you have received this communication in error, please
notify us immediately by email, and delete the original message.