On Thu, Feb 10, 2022 at 10:47 PM Ryan Skraba <[email protected]> wrote:

> I created https://github.com/apache/avro/pull/1533 to show the impact
> on the Python SDK, if we were to follow the "permissive" behaviour and
> allow complex types as names.
>
> It looks like just by disabling the check, Python can already
> disambiguate correctly and use these names as type references.
>
> Even using these types as names might be confusing to the user, at
> this point, I think it's probably for the best to adopt the Java
> behaviour.  Martin and Oscar, were your preferences strongly held?  I
> don't mind continuing to discuss or explore this change!
>

I've changed my mind and I agree now that the permissive behavior in the
Java impl is not a problem!
I also created a PR for the Rust module:
https://github.com/apache/avro/pull/1525



>
> Best regards, Ryan
>
> On Mon, Feb 7, 2022 at 7:09 PM Ryan Skraba <[email protected]> wrote:
> >
> > I created AVRO-3374 for the "mynamespace.int" oddness, where removing
> > the inherited namespace changes the type.  It's kind of a different
> > issue, but it illustrates why a unqualified primitive type can never
> > be a full name -- if it were, a type reference could never be
> > distinguished from the simple primitive.
> >
> > On the other hand, the complex types we've listed here (such as a
> > record named enum) can always be distinguished whether it's a named
> > type reference:
> >
> > // This is an enum
> > {"type": "enum", "name": "MyEnum", "symbols": ["A", B", "C"]}
> >
> > // This must be a type reference to a previously defined named
> > // type because it's otherwise incomplete
> > {"type": "enum"}
> >
> > If we specify the python (strict) behaviour, the spec is a little
> > nicer, but we'll break Java code that has been happily generating
> > records named "record" or "error" in previous versions.  This code
> > does exist in production (we've encountered both those names, and we
> > can see that Flink has been generating unqulaified record by default).
> >
> > If we specify the Java (permissive) behaviour, the spec is a bit more
> > complicated, but code that used to fail in Python will now succeed.
> >
> > My conclusion is that I'd prefer to see the Java behaviour considered
> > "correct", and a fix applied to python.
> >
> > I'd add a clarification to the spec that says something like:
> >
> > "Unlike primitive types, a complex type like "enum" can be also used
> > as a name.  The name can be used as a type reference later, when it is
> > unambiguously _not_ a new defined type definition.  This can lead to
> > confusion, so it's a best practice to set a namespace on these types
> > for readability."
> >
> > I can create a PR to demonstrate, but I'm curious if you think this
> > the right thing to do!
> >
> > Ryan
> >
> > On Mon, Feb 7, 2022 at 4:22 PM Martin Grigorov <[email protected]>
> wrote:
> > >
> > > I've added support for this to the Avro impl -
> > > https://github.com/apache/avro/pull/1525
> > > I will merge it only if the Java behavior is approved.
> > >
> > > On Fri, Feb 4, 2022 at 7:50 PM Ryan Skraba <[email protected]> wrote:
> > >
> > > > Just thinking out loud here, there might be a related-but-not-quite
> > > > ambiguity or bug with primitive type names!
> > > >
> > > > {
> > > >   "type" : "record",
> > > >   "name" : "int",
> > > >   "namespace" : "mynamespace",
> > > >   "fields" : [
> > > >      { "name" : "value", "type" : "long" },
> > > >      { "name" : "next", "type" : [ "null", "mynamespace.int" ]
> > > >   } ]
> > > > }
> > > >
> > > > This is a valid linked list when you parse it with the Java SDK (the
> > > > record has a namespace).  If you get the JSON representation, it
> > > > becomes:
> > > >
> > > > {
> > > >   "type" : "record",
> > > >   "name" : "int",
> > > >   "namespace" : "mynamespace",
> > > >   "fields" : [
> > > >      { "name" : "value", "type" : "long" },
> > > >      { "name" : "next", "type" : [ "null", "int" ]
> > > >   } ]
> > > > }
> > > >
> > > > Which is no longer a linked list.  Fun!
> > > >
> > > > In the meantime, I created
> > > > https://issues.apache.org/jira/browse/FLINK-25962 to try and make a
> > > > fix there.
> > > >
> > > > Best regards, Ryan
> > > >
> > > >
> > > >
> > > > On Fri, Feb 4, 2022 at 6:27 PM Oscar Westra van Holthe - Kind
> > > > <[email protected]> wrote:
> > > > >
> > > > > On fr 4 feb. 2022 15:00, Ryan Skraba <[email protected]> wrote:
> > > > >
> > > > > > [...] I'm not quite sure that there *is* any ambiguity with
> > > > > > the examples.  A JSON object {"type": "record"} without any other
> > > > > > attributes can only be valid if we look at it as a type
> reference (not
> > > > > > a new RECORD type).  Are we allowed to add arbitrary JSON
> properties
> > > > > > to a type reference?
> > > > > >
> > > > >
> > > > > No: that would either change the named type, or create a copy with
> > > > changes
> > > > > (and redefining a named type differently is not allowed).
> > > > >
> > > > >
> > > > > Kind regards,
> > > > > Oscar
> > > > >
> > > > > --
> > > > > Oscar Westra van Holthe - Kind <[email protected]>
> > > > >
> > > > > >
> > > >
>

Reply via email to