Hello! In my experience, the overwhelming majority of UNION are for "optional" or "nullable" fields, and I really like the versatility of the "optional?" feature you've added to IDL. It's an obviously desirable and compelling improvement!
Outside of IDL, I've been caught on this before (and so have others, which is why the Java Schema.Parser class can disable validating defaults). That being said, there's another disadvantage with backwards and forwards compatibility for 2 + 3 that I didn't think of earlier. If you aren't using schema evolution, you can currently ignore the default value for serializing and deserializing data. If we were to reorder any or some UNIONs to match the default value type correctly, it would change the binary serialized data. I suspect this would cause more problems than it would solve, especially because rewriting the order in the UNION would not be very obvious. Something to think about -- what would happen if we were to say in the spec that the default value for a UNION must match either the first non-NULL schema, but can also be null if NULL is present at any position in the UNION? This would have similar compatibility impacts as #2 but without having to rewrite the UNION schema. All my best -- and it's always worth repeating that I'm appreciating and learning a lot from your work on the IDL side! Ryan On Fri, Dec 10, 2021 at 7:30 PM Oscar Westra van Holthe - Kind <[email protected]> wrote: > > Hi everyone, > > In PR #1411 <https://github.com/apache/avro/pull/1411> for issue AVRO-3257 > <https://issues.apache.org/jira/browse/AVRO-3257>, I've added some code to > change/fix a union based on its default value. > Currently, the change is in idl.jj, line 186-210 > <https://github.com/apache/avro/pull/1411/files/6b2d0b20981d81adde0059ee76441dc80de93c29#diff-376865a18691674de38817b4eea7e64c2c3848094a4730030c67f57f54f1028aR186> > . > > Poll: please state your preference: > > 1. Keep the change in idl.jj (in avro-compiler) > Backwards compatibility: perfect > Forwards compatibility: perfect, except for IDL files that use the new ? > type suffix > Advantage: it's only a local change, specific to the ? suffix, so > guaranteed not to introduce bugs > Disadvantage: people who prefer the old syntax cannot benefit > 2. Move the change to the Schema class (in avro-core), and adjust for > all optional schemas (unions of one schema with null) > Backwards compatibility: perfect > Forwards compatibility: partial (JSON schemas depending on the new > feature break in earlier versions) > Advantage: works for all optional schemas, both in IDL and JSON files. > Disadvantage: although I'm fairly certain it would work, this does > affect schema parsing; code I'm not as familiar with > 3. Move the change to the Schema class, and adjust for all unions (move > the first schema for which the default value validates to the front) > Backwards compatibility: perfect > Forwards compatibility: partial (JSON schemas depending on the new > feature break in earlier versions) > Advantage: works for all unions, both in IDL and JSON files. > Disadvantage: although I'm fairly certain it would work, this does > affect schema parsing; code I'm not as familiar with > > Note that all solutions keep the in-memory representation of Schema > instances intact. As a result, (de)serialisation, the schema fingerprint, > etc. are not affected. > > Kind regards, > Oscar
