I don't think we can change the behavior of the "default" key. Otherwise,
older readers would use the wrong value.

I suggest that we add an optional key, like "default-as-string", that is
used to fill in a missing "default" key if there is a reasonable
conversion. On write, the write schema would convert to the normal
"default" field for backward-compatibility. On read, you can supply only
the string default to use that instead of the binary one. I think we could
take care of this entirely in the schema parser.

rb

On Tue, Oct 17, 2017 at 11:53 PM, Bridger Howell <[email protected]> wrote:

> I really like the idea of having support for human-readable default values.
>
> I think I prefer to keep the way defaults are interpreted separate from
> logical types, since logical types having are basically optional. I would
> be surprised if my language of choice could understand an ISO-8601
> formatted local-date for a field default based on logical type, but I still
> had to interface with a numeric value in my code.
>
> If this doesn't conflict too much with the default value for record fields
> (?), I would suggest having an object syntax with a "parser" or "type"
> field in addition to the default property.
>
> A sample record:
> {
>   "type": "record",
>   "name": "Foo",
>   "fields": [
>     {
>       "name: "body",
>       "type": "bytes",
>       "default": {
>         "value": "aGVsbG8gd29ybGQ",
>         "parser": "base64",
>         "doc": "'hello world' as a base64-encoded string"
>       }
>   ]
> }
>
> If changing the "default" property like that has too many issues, I suppose
> a parallel "default-parser" property would do the trick too.
>
> I think this type of approach keeps us neatly separated from logical types,
> so that having a parser for a default value doesn't require a logical type,
> and maybe makes it clearer which procedure is being performed on the JSON
> data to convert it to the base field type.
>
> -Bridger Howell
>
> On Tue, Oct 17, 2017 at 9:57 AM, Ryan Blue <[email protected]>
> wrote:
>
> > I think that the parsing canonical form of a schema
> > <https://avro.apache.org/docs/1.8.2/spec.html#Parsing+Canoni
> > cal+Form+for+Schemas>
> > doesn't include the default. I think that makes sense because the
> canonical
> > form is what's needed to read encoded data. Anyone with more context: is
> > that correct?
> >
> > In my opinion, that makes how we handle defaults a bit more flexible
> > because schemas with different defaults are "the same". I'd support
> adding
> > a new default field that handles values more naturally. We've always had
> a
> > problem with binary as well and I'd like to see us use base64 encoded
> > values instead of the current strategy.
> >
> > rb
> >
> > On Tue, Oct 17, 2017 at 8:16 AM, Zoltan Ivanfi <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > I would like to start a discussion about making default values and
> values
> > > in general human-readable for logical types.
> > >
> > > Currently default values for logical types have to be specified in a
> JSON
> > > string as the binary representation of the backing primary type (e.g.,
> > > "\u0000"). Some users intuitively try to specify a human-readable
> logical
> > > value in this string instead (e.g., "0.00"). This is of course a valid
> > byte
> > > sequence and as such is accepted, but it results in unexpected
> behaviour
> > (a
> > > different default value than intended). Apart from being error prone,
> > > specifying default values this way is also tedious. To keep this e-mail
> > > brief, I won't list specific examples here, please see AVRO-2087
> > > <https://issues.apache.org/jira/browse/AVRO-2087> for details instead.
> > >
> > > The problem of non-human-readable values applies to JSON encoding of
> > actual
> > > data as well. One reason for using JSON is that it is human readable
> and
> > > therefore easy to debug. Seeing "\u00018" in a JSON file is not too
> > > intuitive and this specific example is actually quite misleading as
> well
> > > (it can be easily misread as "\u0018").
> > >
> > > Introducing a new default value field (called human-readable-default or
> > > logical-default for example) would allow easier specification of
> default
> > > values. (It doesn't solve the problem of accidentally misusing the
> > existing
> > > field though.) It is, however, not backwards compatible. An older Avro
> > > library would ignore the new field and use a different default value.
> > >
> > > Introducing human-readable values in the JSON encoding is even more
> > clearly
> > > a breaking change. (Although for JSON we could add the human-readable
> > value
> > > as a separate extra field that gets ignored when reading. Problem is,
> > users
> > > may be tempted to change the value and be surprised. It's a pity that
> > JSON
> > > does not allow comments.)
> > >
> > > In your opinions, what would be the best way to deal with this problem?
> > >
> > > Thanks,
> > >
> > > Zoltan
> > >
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>
> --
>
>
> The information contained in this email message is PRIVATE and intended
> only for the personal and confidential use of the recipient named above. If
> the reader of this message is not the intended recipient or an agent
> responsible for delivering it to the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution or copying of this message is strictly
> prohibited.  If you have received this communication in error, please
> notify us immediately by email, and delete the original message.
>



-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to