Hi, On Thu, Oct 19, 2017 at 7:16 AM, Bridger Howell <bhow...@sofi.org> wrote:
> So then if an older reader reads a schema field with "default-as-string" > used instead of "default", it will decide that field has no default? I > don't really like that, but it's better than using the wrong value (e.g. > "default" + "default-parser") I think ignoring the user-specified default value is just as bad using a wrong value. I equally consider both breaking changes. > or erroring on most data reads (changing the "default" field to an object). But if we can't make this feature non-breaking and have to put it in a new major release, then I think that it's better to cause an explicit error in old versions rather than silently getting unexpected behaviour. > I don't think we can make old readers fail > properly, since they would have to already have the future knowledge that > there is supposed to be a default value. Someone correct me if I'm wrong on > this. > What do you mean by failing properly? I think specifying a value that does not belong to the types allowed by the older specification can reliable cause a failure, albeit certainly not with an error message that would describe the cause properly. On Wed, Oct 18, 2017 at 9:56 AM, Ryan Blue <rb...@netflix.com.invalid> wrote: > I suggest that we add an optional key, like "default-as-string", that is > used to fill in a missing "default" key if there is a reasonable > conversion. This would still be a breaking change though, since older versions will ignore the "default-as-string" field. > On write, the write schema would convert to the normal > "default" field for backward-compatibility. I'm sorry, I can't quite follow, could you please elaborate? > On read, you can supply only > the string default to use that instead of the binary one. I don't understand this either, could you please explain this through an example? On Tue, Oct 17, 2017 at 11:53 PM, Bridger Howell <bhow...@sofi.org> wrote: > > I really like the idea of having support for human-readable default > > values. > > > > I think I prefer to keep the way defaults are interpreted separate from > > logical types, since logical types having are basically optional. So if I understand correctly, you support the idea of human-readable defaults but not logical-type-dependent interpretation. I don't see how we could achieve the first without the second, since different logical types have different human-readable representations. So it seems that the optional nature of logical types actually makes this feature impossible. > > "doc": "'hello world' as a base64-encoded string" I like the idea of having a doc field. This matches the much-desired JSON commenting ability the closest. I don't see how this would help with default values in schemas, since schemas are written directly by users. (Or is there a tool for doing so?) However, we could do this with the actual values written to JSON as well. As I wrote earlier, I was afraid to suggest an additional field like this: "num": "\u000C\u006C", "num-human-readable": 31.80 Because users may be tempted to modify the "num-human-readable" field, thinking that the change will have some effect. However, if we use a doc string instead: "num": "\u000C\u006C", "num-doc": "binary representation of the decimal value 31.80" then I think most users will realize that they can't modify the value of "num" by modifying "num-doc". > > I think this type of approach keeps us neatly separated from logical types, > > so that having a parser for a default value doesn't require a logical type, Wouldn't the separate parser approach lead to the same problem in the end? It is more general and thus allows more use-cases, but if you would like to specify a decimal value as a number, you still have to have a parser implemented for it. > > > I think that the parsing canonical form of a schema > > > <https://avro.apache.org/docs/ 1.8.2/spec.html#Parsing+Canonical+Form+for+Schemas> > > > doesn't include the default. I think that makes sense because the > > > canonical form is what's needed to read encoded data. That's strange, since according to the specification, the default is used when reading instances that lack a value for the field, so I think it is needed for reading encoded data. So far the discussion focused on default values in schemas, I would encourage everyone to also share their opinions about actual data written using JSON encoding. Br, Zoltan