[
https://issues.apache.org/jira/browse/AVRO-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Skraba updated AVRO-3370:
------------------------------
Description:
We've run across this in some code that interoperates between Java and Python.
The spec [currently
forbids|https://avro.apache.org/docs/current/spec.html#names] using a primitive
type name as a keyword: _*Primitive type names have no namespace and their
names may not be defined in any namespace.*_
{code:java}
{"type":"record","name":"long","fields":[{"name":"a1","type":"long"}]} {code}
That fails in Java with {{"org.apache.avro.AvroTypeException: Schemas may not
be named after primitives: long"}}
What do we expect to happen when a named schema uses a complex type?
{code:java}
{"type":"record","name":"record","fields":[{"name":"a1","type":"long"}]} {code}
This currently *succeeds* in Java and the schema can be used to serialize and
deserialize data.
This currently *fails* in Python with: {{avro.schema.SchemaParseException:
record is a reserved type name}}
Which one is the correct behaviour?
This gets a bit more complicated when we consider using the name as a reference.
The following two schemas both work in Java:
{code:java}
{"type":"record","name":"LinkedList",
"fields":[
{"name":"value","type":"int},
{"name":"next","type":["null","LinkedList"]}]}" {code}
{code:java}
{"type":"record","name":"LinkedList",
"fields":[
{"name":"value","type":"int},
{"name":"next","type":["null",{"type":"LinkedList"}]}]}"
{code}
If we rename {{LinkedList}} to {{record}} the former succeeds in Java and the
latter fails with {{{}org.apache.avro.SchemaParseException: No name in schema:
{"type":"record"{}}}}
{*}Edit{*}: The consensus on the [mailing
list|https://lists.apache.org/thread/0wmgyx6z69gy07lvj9ndko75752b8cn2] is the
"permissive" behaviour of Java should be adopted, in order to align the SDKs.
The specification doesn't currently forbid these, and this should be clarified
explicitly. We should probably say that it's a best practice to avoid doing
this, especially in the null namespace, since it can be confusing to a reader
and potentially cause ambiguities when JSON encoding data.
was:
We've run across this in some code that interoperates between Java and Python.
The spec [currently
forbids|https://avro.apache.org/docs/current/spec.html#names] using a primitive
type name as a keyword: _*Primitive type names have no namespace and their
names may not be defined in any namespace.*_
{code:java}
{"type":"record","name":"long","fields":[{"name":"a1","type":"long"}]} {code}
That fails in Java with {{"org.apache.avro.AvroTypeException: Schemas may not
be named after primitives: long"}}
What do we expect to happen when a named schema uses a complex type?
{code:java}
{"type":"record","name":"record","fields":[{"name":"a1","type":"long"}]} {code}
This currently *succeeds* in Java and the schema can be used to serialize and
deserialize data.
This currently *fails* in Python with: {{avro.schema.SchemaParseException:
record is a reserved type name}}
Which one is the correct behaviour?
This gets a bit more complicated when we consider using the name as a reference.
The following two schemas both work in Java:
{code:java}
{"type":"record","name":"LinkedList",
"fields":[
{"name":"value","type":"int},
{"name":"next","type":["null","LinkedList"]}]}" {code}
{code:java}
{"type":"record","name":"LinkedList",
"fields":[
{"name":"value","type":"int},
{"name":"next","type":["null",{"type":"LinkedList"}]}]}"
{code}
If we rename {{LinkedList}} to {{record}} the former succeeds in Java and the
latter fails with {{{}org.apache.avro.SchemaParseException: No name in schema:
{"type":"record"{}}}}
> [Spec] Inconsistent behaviour on types as invalid names.
> --------------------------------------------------------
>
> Key: AVRO-3370
> URL: https://issues.apache.org/jira/browse/AVRO-3370
> Project: Apache Avro
> Issue Type: Bug
> Reporter: Ryan Skraba
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.11.1
>
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> We've run across this in some code that interoperates between Java and Python.
> The spec [currently
> forbids|https://avro.apache.org/docs/current/spec.html#names] using a
> primitive type name as a keyword: _*Primitive type names have no namespace
> and their names may not be defined in any namespace.*_
> {code:java}
> {"type":"record","name":"long","fields":[{"name":"a1","type":"long"}]} {code}
> That fails in Java with {{"org.apache.avro.AvroTypeException: Schemas may not
> be named after primitives: long"}}
> What do we expect to happen when a named schema uses a complex type?
> {code:java}
> {"type":"record","name":"record","fields":[{"name":"a1","type":"long"}]}
> {code}
> This currently *succeeds* in Java and the schema can be used to serialize and
> deserialize data.
> This currently *fails* in Python with: {{avro.schema.SchemaParseException:
> record is a reserved type name}}
> Which one is the correct behaviour?
> This gets a bit more complicated when we consider using the name as a
> reference.
> The following two schemas both work in Java:
> {code:java}
> {"type":"record","name":"LinkedList",
> "fields":[
> {"name":"value","type":"int},
> {"name":"next","type":["null","LinkedList"]}]}" {code}
> {code:java}
> {"type":"record","name":"LinkedList",
> "fields":[
> {"name":"value","type":"int},
> {"name":"next","type":["null",{"type":"LinkedList"}]}]}"
> {code}
> If we rename {{LinkedList}} to {{record}} the former succeeds in Java and the
> latter fails with {{{}org.apache.avro.SchemaParseException: No name in
> schema: {"type":"record"{}}}}
> {*}Edit{*}: The consensus on the [mailing
> list|https://lists.apache.org/thread/0wmgyx6z69gy07lvj9ndko75752b8cn2] is the
> "permissive" behaviour of Java should be adopted, in order to align the SDKs.
> The specification doesn't currently forbid these, and this should be
> clarified explicitly. We should probably say that it's a best practice to
> avoid doing this, especially in the null namespace, since it can be confusing
> to a reader and potentially cause ambiguities when JSON encoding data.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)