I created AVRO-3374 for the "mynamespace.int" oddness, where removing
the inherited namespace changes the type. It's kind of a different
issue, but it illustrates why a unqualified primitive type can never
be a full name -- if it were, a type reference could never be
distinguished from the simple primitive.
On the other hand, the complex types we've listed here (such as a
record named enum) can always be distinguished whether it's a named
type reference:
// This is an enum
{"type": "enum", "name": "MyEnum", "symbols": ["A", B", "C"]}
// This must be a type reference to a previously defined named
// type because it's otherwise incomplete
{"type": "enum"}
If we specify the python (strict) behaviour, the spec is a little
nicer, but we'll break Java code that has been happily generating
records named "record" or "error" in previous versions. This code
does exist in production (we've encountered both those names, and we
can see that Flink has been generating unqulaified record by default).
If we specify the Java (permissive) behaviour, the spec is a bit more
complicated, but code that used to fail in Python will now succeed.
My conclusion is that I'd prefer to see the Java behaviour considered
"correct", and a fix applied to python.
I'd add a clarification to the spec that says something like:
"Unlike primitive types, a complex type like "enum" can be also used
as a name. The name can be used as a type reference later, when it is
unambiguously _not_ a new defined type definition. This can lead to
confusion, so it's a best practice to set a namespace on these types
for readability."
I can create a PR to demonstrate, but I'm curious if you think this
the right thing to do!
Ryan
On Mon, Feb 7, 2022 at 4:22 PM Martin Grigorov <[email protected]> wrote:
>
> I've added support for this to the Avro impl -
> https://github.com/apache/avro/pull/1525
> I will merge it only if the Java behavior is approved.
>
> On Fri, Feb 4, 2022 at 7:50 PM Ryan Skraba <[email protected]> wrote:
>
> > Just thinking out loud here, there might be a related-but-not-quite
> > ambiguity or bug with primitive type names!
> >
> > {
> > "type" : "record",
> > "name" : "int",
> > "namespace" : "mynamespace",
> > "fields" : [
> > { "name" : "value", "type" : "long" },
> > { "name" : "next", "type" : [ "null", "mynamespace.int" ]
> > } ]
> > }
> >
> > This is a valid linked list when you parse it with the Java SDK (the
> > record has a namespace). If you get the JSON representation, it
> > becomes:
> >
> > {
> > "type" : "record",
> > "name" : "int",
> > "namespace" : "mynamespace",
> > "fields" : [
> > { "name" : "value", "type" : "long" },
> > { "name" : "next", "type" : [ "null", "int" ]
> > } ]
> > }
> >
> > Which is no longer a linked list. Fun!
> >
> > In the meantime, I created
> > https://issues.apache.org/jira/browse/FLINK-25962 to try and make a
> > fix there.
> >
> > Best regards, Ryan
> >
> >
> >
> > On Fri, Feb 4, 2022 at 6:27 PM Oscar Westra van Holthe - Kind
> > <[email protected]> wrote:
> > >
> > > On fr 4 feb. 2022 15:00, Ryan Skraba <[email protected]> wrote:
> > >
> > > > [...] I'm not quite sure that there *is* any ambiguity with
> > > > the examples. A JSON object {"type": "record"} without any other
> > > > attributes can only be valid if we look at it as a type reference (not
> > > > a new RECORD type). Are we allowed to add arbitrary JSON properties
> > > > to a type reference?
> > > >
> > >
> > > No: that would either change the named type, or create a copy with
> > changes
> > > (and redefining a named type differently is not allowed).
> > >
> > >
> > > Kind regards,
> > > Oscar
> > >
> > > --
> > > Oscar Westra van Holthe - Kind <[email protected]>
> > >
> > > >
> >