Re: (Default) values for logical types in human-readable form

Zoltan Ivanfi Thu, 19 Oct 2017 09:00:16 -0700

Hi,

On Thu, Oct 19, 2017 at 7:16 AM, Bridger Howell <bhow...@sofi.org> wrote:

> So then if an older reader reads a schema field with "default-as-string"
> used instead of "default", it will decide that field has no default? I
> don't really like that, but it's better than using the wrong value (e.g.
> "default" + "default-parser")

I think ignoring the user-specified default value is just as bad using a
wrong value. I equally consider both breaking changes.

> or erroring on most data reads (changing the "default" field to an object).

But if we can't make this feature non-breaking and have to put it in a new
major release, then I think that it's better to cause an explicit error in
old versions rather than silently getting unexpected behaviour.

> I don't think we can make old readers fail
> properly, since they would have to already have the future knowledge that
> there is supposed to be a default value. Someone correct me if I'm wrong on
> this.
>

What do you mean by failing properly? I think specifying a value that does
not belong to the types allowed by the older specification can reliable
cause a failure, albeit certainly not with an error message that would
describe the cause properly.

On Wed, Oct 18, 2017 at 9:56 AM, Ryan Blue <rb...@netflix.com.invalid>
wrote:

> I suggest that we add an optional key, like "default-as-string", that is
> used to fill in a missing "default" key if there is a reasonable
> conversion.

This would still be a breaking change though, since older versions will
ignore the "default-as-string" field.

> On write, the write schema would convert to the normal
> "default" field for backward-compatibility.

I'm sorry, I can't quite follow, could you please elaborate?

> On read, you can supply only
> the string default to use that instead of the binary one.

I don't understand this either, could you please explain this through an
example?

On Tue, Oct 17, 2017 at 11:53 PM, Bridger Howell <bhow...@sofi.org> wrote:

> > I really like the idea of having support for human-readable default
> > values.
> >
> > I think I prefer to keep the way defaults are interpreted separate from
> > logical types, since logical types having are basically optional.

So if I understand correctly, you support the idea of human-readable
defaults but not logical-type-dependent interpretation. I don't see how we
could achieve the first without the second, since different logical types
have different human-readable representations. So it seems that the
optional nature of logical types actually makes this feature impossible.

> >         "doc": "'hello world' as a base64-encoded string"

I like the idea of having a doc field. This matches the much-desired JSON
commenting ability the closest. I don't see how this would help with
default values in schemas, since schemas are written directly by users. (Or
is there a tool for doing so?) However, we could do this with the actual
values written to JSON as well. As I wrote earlier, I was afraid to suggest
an additional field like this:

"num": "\u000C\u006C",
"num-human-readable": 31.80

Because users may be tempted to modify the "num-human-readable" field,
thinking that the change will have some effect. However, if we use a doc
string instead:

"num": "\u000C\u006C",
"num-doc": "binary representation of the decimal value 31.80"

then I think most users will realize that they can't modify the value of
"num" by modifying "num-doc".

> > I think this type of approach keeps us neatly separated from logical
types,
> > so that having a parser for a default value doesn't require a logical
type,

Wouldn't the separate parser approach lead to the same problem in the end?
It is more general and thus allows more use-cases, but if you would like to
specify a decimal value as a number, you still have to have a parser
implemented for it.

> > > I think that the parsing canonical form of a schema
> > > <https://avro.apache.org/docs/
1.8.2/spec.html#Parsing+Canonical+Form+for+Schemas>
> > > doesn't include the default. I think that makes sense because the
> > > canonical form is what's needed to read encoded data.

That's strange, since according to the specification, the default is used
when reading instances that lack a value for the field, so I think it is
needed for reading encoded data.

So far the discussion focused on default values in schemas, I would
encourage everyone to also share their opinions about actual data written
using JSON encoding.

Br,

Zoltan

Re: (Default) values for logical types in human-readable form

Reply via email to