On Thu, Feb 10, 2022 at 10:47 PM Ryan Skraba <[email protected]> wrote:
> I created https://github.com/apache/avro/pull/1533 to show the impact > on the Python SDK, if we were to follow the "permissive" behaviour and > allow complex types as names. > > It looks like just by disabling the check, Python can already > disambiguate correctly and use these names as type references. > > Even using these types as names might be confusing to the user, at > this point, I think it's probably for the best to adopt the Java > behaviour. Martin and Oscar, were your preferences strongly held? I > don't mind continuing to discuss or explore this change! > I've changed my mind and I agree now that the permissive behavior in the Java impl is not a problem! I also created a PR for the Rust module: https://github.com/apache/avro/pull/1525 > > Best regards, Ryan > > On Mon, Feb 7, 2022 at 7:09 PM Ryan Skraba <[email protected]> wrote: > > > > I created AVRO-3374 for the "mynamespace.int" oddness, where removing > > the inherited namespace changes the type. It's kind of a different > > issue, but it illustrates why a unqualified primitive type can never > > be a full name -- if it were, a type reference could never be > > distinguished from the simple primitive. > > > > On the other hand, the complex types we've listed here (such as a > > record named enum) can always be distinguished whether it's a named > > type reference: > > > > // This is an enum > > {"type": "enum", "name": "MyEnum", "symbols": ["A", B", "C"]} > > > > // This must be a type reference to a previously defined named > > // type because it's otherwise incomplete > > {"type": "enum"} > > > > If we specify the python (strict) behaviour, the spec is a little > > nicer, but we'll break Java code that has been happily generating > > records named "record" or "error" in previous versions. This code > > does exist in production (we've encountered both those names, and we > > can see that Flink has been generating unqulaified record by default). > > > > If we specify the Java (permissive) behaviour, the spec is a bit more > > complicated, but code that used to fail in Python will now succeed. > > > > My conclusion is that I'd prefer to see the Java behaviour considered > > "correct", and a fix applied to python. > > > > I'd add a clarification to the spec that says something like: > > > > "Unlike primitive types, a complex type like "enum" can be also used > > as a name. The name can be used as a type reference later, when it is > > unambiguously _not_ a new defined type definition. This can lead to > > confusion, so it's a best practice to set a namespace on these types > > for readability." > > > > I can create a PR to demonstrate, but I'm curious if you think this > > the right thing to do! > > > > Ryan > > > > On Mon, Feb 7, 2022 at 4:22 PM Martin Grigorov <[email protected]> > wrote: > > > > > > I've added support for this to the Avro impl - > > > https://github.com/apache/avro/pull/1525 > > > I will merge it only if the Java behavior is approved. > > > > > > On Fri, Feb 4, 2022 at 7:50 PM Ryan Skraba <[email protected]> wrote: > > > > > > > Just thinking out loud here, there might be a related-but-not-quite > > > > ambiguity or bug with primitive type names! > > > > > > > > { > > > > "type" : "record", > > > > "name" : "int", > > > > "namespace" : "mynamespace", > > > > "fields" : [ > > > > { "name" : "value", "type" : "long" }, > > > > { "name" : "next", "type" : [ "null", "mynamespace.int" ] > > > > } ] > > > > } > > > > > > > > This is a valid linked list when you parse it with the Java SDK (the > > > > record has a namespace). If you get the JSON representation, it > > > > becomes: > > > > > > > > { > > > > "type" : "record", > > > > "name" : "int", > > > > "namespace" : "mynamespace", > > > > "fields" : [ > > > > { "name" : "value", "type" : "long" }, > > > > { "name" : "next", "type" : [ "null", "int" ] > > > > } ] > > > > } > > > > > > > > Which is no longer a linked list. Fun! > > > > > > > > In the meantime, I created > > > > https://issues.apache.org/jira/browse/FLINK-25962 to try and make a > > > > fix there. > > > > > > > > Best regards, Ryan > > > > > > > > > > > > > > > > On Fri, Feb 4, 2022 at 6:27 PM Oscar Westra van Holthe - Kind > > > > <[email protected]> wrote: > > > > > > > > > > On fr 4 feb. 2022 15:00, Ryan Skraba <[email protected]> wrote: > > > > > > > > > > > [...] I'm not quite sure that there *is* any ambiguity with > > > > > > the examples. A JSON object {"type": "record"} without any other > > > > > > attributes can only be valid if we look at it as a type > reference (not > > > > > > a new RECORD type). Are we allowed to add arbitrary JSON > properties > > > > > > to a type reference? > > > > > > > > > > > > > > > > No: that would either change the named type, or create a copy with > > > > changes > > > > > (and redefining a named type differently is not allowed). > > > > > > > > > > > > > > > Kind regards, > > > > > Oscar > > > > > > > > > > -- > > > > > Oscar Westra van Holthe - Kind <[email protected]> > > > > > > > > > > > > > > > >
