Hello! Just to clarify -- we probably should not take into account
any implementation-specific keywords and mangling when making this
decision!
I agree it's confusing because we recently added "record" as a keyword
to be mangled (everywhere), and then recently merged AVRO-3305[1]
because it doesn't need to be mangled in enum symbols or packages.
That is entirely a Java issue with generated code, and (as Oscar
mentions) could always change with new keywords in the future.
I haven't quite decided which I prefer the strict (Python) or
permissive (Java) version -- for example, should we forbid "union" as
a record name? Because of the way we encode the schema in JSON, it's
NEVER ambiguous as a type name or a type reference. If we forbid
"record" should we forbid "error" as well? It's not a complex type,
but it's present as a type in the context of a protocol.
That being said, I'm not quite sure that there *is* any ambiguity with
the examples. A JSON object {"type": "record"} without any other
attributes can only be valid if we look at it as a type reference (not
a new RECORD type). Are we allowed to add arbitrary JSON properties
to a type reference?
In that case, the "enum" can only be a type reference to the already
defined record named enum. Can we come up with an examples that are
actually ambiguous?
Best regards, Ryan
[1]:https://github.com/apache/avro/pull/1457 "Only mangle when necessary"
On Fri, Feb 4, 2022 at 1:40 PM Martin Grigorov <[email protected]> wrote:
>
> On Fri, Feb 4, 2022 at 2:12 PM Oscar Westra van Holthe - Kind <
> [email protected]> wrote:
>
> > On fr 4 feb. 2022 12:13, Ryan Skraba <[email protected]> wrote:
> >
> > > Hello! I created the JIRA AVRO-3370[1] that demonstrates two
> > > different behaviours between Java and python with respect to using
> > > complex types (such as "record" as a name in a named type).
> > > [...]
> > > There's probably two fixes to be done here:
> > > - Better define the behaviour so all language SDKs are consistent, and
> > > - Contribute an upstream fix to Flink so that it's compatible with
> > python.
> > >
> >
> > We should indeed do these things both. The only question IMHO is how.
> >
> >
> > What do you think is the right thing to do? Should we be able to
> > > define a record named record (the Java behaviour) or should the spec
> > > be stricter about using types like names (the Python behaviour)?
> >
> >
> > Avoiding keywords for C/C++/C#, Java, Javascript, Python, Pearl, Rust, ...
> > is tricky at best and usually error prone. Plus, evolving languages (like
> > the new keywords record & sealed in Java) easily break backwards
> > compatibility. As a result, I think we should let keyword conflicts for
> > e.g. generated code be handled per language (as needed), and have Avro
> > define whatever we like.
> >
> > I like flexibility, so I vote to allow any name that isn't a type
> > _reference_. This means names like string, int, null, float, etc. are not
> > allowed, but date, time_ms, record, union, etc. are.
>
>
> Wouldn't this make it hard for Schema references ?
> Consider
>
> {"type":"record","name":"record",
> "fields":[
> {"name":"value","type":"int},
> {"name":"next","type":["null",{"type":"*record*"}]}]}"
>
>
> The latter "type": "record" would be really confusing for parsing. Same for
> enum, fixed & union types.
>
> IMO Avro specification should disallow using both primitive types (string,
> int, null, float, etc.) and complex types (record, enum, union, fixed) as
> names.
> Every language impl should mangle its generated types considering its own
> keywords.
>
>
> >
> >
> > Kind regards,
> > Oscar
> >
> > --
> > Oscar Westra van Holthe - Kind <[email protected]>
> >