That sounds like a good direction. I filed https://issues.apache.org/jira/browse/AVRO-4149 for the specification changes and https://issues.apache.org/jira/browse/AVRO-4150 for the Java bug.

On 6/8/25 06:30, Oscar Westra van Holthe - Kind wrote:
Hi Chad,

This ambiguity has indeed been brought up before, and the spec and the Java
implementation are not in sync on this.

Originally, the spec followed Java packages on this: from inside a package,
it's
not possible to reference a class in the default/null package. This also
means
that in Avro, a simple name is first referenced against the current
namespace.

Referencing a simple name in the null namespace was implemented as a
courtesy,
but names in the current namespace take precedence. Otherwise, using
namespaces
would be cumbersome and namespaces would become an unused feature.

What we should do IMHO, is fix the spec to explicitly mention the null
namespace
in a full name (and make sure the rest of the text is consistent). Then, the
Java SDK should be fixed to write a full name in the null namespace like
".Target" instead of "Target".

Kind regards,
Oscar

On Wed, 4 Jun 2025 at 18:15, Chad Parry <c...@parry.org> wrote:

On 6/4/25 04:09, Martin Grigorov wrote:
HI Chad,

https://avro.apache.org/docs/++version++/specification/#names says: "The
empty string may also be used as a namespace to indicate the null
namespace."

I interpreted that sentence to be describing the standalone "namespace"
attribute used in type definitions, not the type name used in references.

That is, by using an empty string for the namespace you can help the
resolver to find the schema definition. I.e. you can use `name:
".Target"`
to say "I want to use the top-level Target schema"

The very next paragraph directly contradicts that: "The null namespace
may not be used in a dot-separated sequence of names." If your
interpretation is correct, then the word "not" needs to be removed. Or
if that sentence is only intended to restrict the standalone "namespace"
attribute, then it should explicitly allow the use of a null namespace
in a dot-separated fullname. An example type name with a leading dot
would be welcome.

I tested the Java API, and I'm surprised to see that it does partially
conform to your interpretation. The Schema implementation serializes
references into the null namespace incorrectly, causing the data
corruption I illustrated. A bug would have to be filed for that. But the
Parser implementation does accept ".Target" as a type name in references.




On Wed, Jun 4, 2025 at 1:49 AM Chad Parry <c...@parry.org> wrote:

It is possible to construct an ambiguous schema using the latest Avro
specification. Before I file a JIRA issue, I want to check whether this
is a known deficiency. I believe this is a bug in the specification, not
any particular implementation.

Types can be defined in the null namespace, and then those types can be
referenced later. Such a reference would not contain any dots. For
example, if we define the type "Target" in the null namespace, we can
refer to it with the fullname "Target". However, the specification says
that when a reference has no dot, "the namespace is the namespace of the
enclosing definition." That means we could define a different type
"Target" in the namespace "org.apache.avro". It could be referenced with
the fullname "org.apache.avro.Target". If the enclosing namespace is
already "org.apache.avro", then it could also be referenced with the
simple name "Target". The problem arises when a single schema includes
both those types, and "Target" is a valid reference to either one.

In short, it is impossible to distinguish a qualified name that happens
to be in the null namespace from a simple name. The specification
creates this problem by neglecting the null namespace when it defines a
fullname as "composed of two parts: a name and a namespace, separated by
a dot."

This could be solved by simply resolving all ambiguities in favor of the
null namespace reference. For example, the reference "Target" should be
interpreted as a fullname if such a type exists and as a simple name
otherwise. If the author didn't intend to reference into the null
namespace, then they can unambiguously use a fullname reference instead.
Any solution will create compatibility concerns, so first I just want to
discuss whether this is believed to be a problem.

The following complete test case illustrates how this issue leads to
data corruption with the Java API. Note that the Java implementation
neither detects the ambiguity nor resolves it the way I am recommending.

       @Test
       void testAmbiguousReference() {
           final Schema target = SchemaBuilder.builder()
                   .record("Target")
                   .doc("right")
                   .fields()
                   .endRecord();
           final Schema decoy = SchemaBuilder.builder()
                   .record(target.getName())
                   .namespace("org.apache.avro")
                   .doc("wrong")
                   .fields()
                   .endRecord();
           final Schema ambiguous = SchemaBuilder.builder()
                   .record("Ambiguous")
                   .fields()
                       .name("definition")
                           .type(target)
                           .noDefault()
                       .name("working")
                           .type(target)
                           .noDefault()
                       .name("enclosing")
                           .type(SchemaBuilder.builder()
                                   .record("Enclosing")
                                   .namespace("org.apache.avro")
                                   .fields()
                                       .name("decoy")
                                           .type(decoy)
                                           .noDefault()
                                       .name("working")
                                           .type(decoy)
                                           .noDefault()
                                       .name("broken")
                                           .type(target)
                                           .noDefault()
                                   .endRecord())
                           .noDefault()
                   .endRecord();
           final Schema parsed = new Schema.Parser().parse(
                   ambiguous.toString());
           // This assertion succeeds.
           Assertions.assertEquals(
                   ambiguous.getField("working").schema(),
                   parsed.getField("working").schema());
           // This assertion succeeds but the specification is unclear.
           Assertions.assertEquals(
                   ambiguous.getField("enclosing").schema()
                           .getField("working").schema(),
                   parsed.getField("enclosing").schema()
                           .getField("working").schema());
           // This assertion FAILS.
           Assertions.assertEquals(
                   ambiguous.getField("enclosing").schema()
                           .getField("broken").schema(),
                   parsed.getField("enclosing").schema()
                           .getField("broken").schema());
       }

The assertion failure message complains:
expected: <{"type":"record","name":"Target","doc":"right","fields":[]}>
but was:


<{"type":"record","name":"Target","namespace":"org.apache.avro","doc":"wrong","fields":[]}>






Reply via email to