It is possible to construct an ambiguous schema using the latest Avro specification. Before I file a JIRA issue, I want to check whether this is a known deficiency. I believe this is a bug in the specification, not any particular implementation.

Types can be defined in the null namespace, and then those types can be referenced later. Such a reference would not contain any dots. For example, if we define the type "Target" in the null namespace, we can refer to it with the fullname "Target". However, the specification says that when a reference has no dot, "the namespace is the namespace of the enclosing definition." That means we could define a different type "Target" in the namespace "org.apache.avro". It could be referenced with the fullname "org.apache.avro.Target". If the enclosing namespace is already "org.apache.avro", then it could also be referenced with the simple name "Target". The problem arises when a single schema includes both those types, and "Target" is a valid reference to either one.

In short, it is impossible to distinguish a qualified name that happens to be in the null namespace from a simple name. The specification creates this problem by neglecting the null namespace when it defines a fullname as "composed of two parts: a name and a namespace, separated by a dot."

This could be solved by simply resolving all ambiguities in favor of the null namespace reference. For example, the reference "Target" should be interpreted as a fullname if such a type exists and as a simple name otherwise. If the author didn't intend to reference into the null namespace, then they can unambiguously use a fullname reference instead. Any solution will create compatibility concerns, so first I just want to discuss whether this is believed to be a problem.

The following complete test case illustrates how this issue leads to data corruption with the Java API. Note that the Java implementation neither detects the ambiguity nor resolves it the way I am recommending.

    @Test
    void testAmbiguousReference() {
        final Schema target = SchemaBuilder.builder()
                .record("Target")
                .doc("right")
                .fields()
                .endRecord();
        final Schema decoy = SchemaBuilder.builder()
                .record(target.getName())
                .namespace("org.apache.avro")
                .doc("wrong")
                .fields()
                .endRecord();
        final Schema ambiguous = SchemaBuilder.builder()
                .record("Ambiguous")
                .fields()
                    .name("definition")
                        .type(target)
                        .noDefault()
                    .name("working")
                        .type(target)
                        .noDefault()
                    .name("enclosing")
                        .type(SchemaBuilder.builder()
                                .record("Enclosing")
                                .namespace("org.apache.avro")
                                .fields()
                                    .name("decoy")
                                        .type(decoy)
                                        .noDefault()
                                    .name("working")
                                        .type(decoy)
                                        .noDefault()
                                    .name("broken")
                                        .type(target)
                                        .noDefault()
                                .endRecord())
                        .noDefault()
                .endRecord();
        final Schema parsed = new Schema.Parser().parse(
                ambiguous.toString());
        // This assertion succeeds.
        Assertions.assertEquals(
                ambiguous.getField("working").schema(),
                parsed.getField("working").schema());
        // This assertion succeeds but the specification is unclear.
        Assertions.assertEquals(
                ambiguous.getField("enclosing").schema()
                        .getField("working").schema(),
                parsed.getField("enclosing").schema()
                        .getField("working").schema());
        // This assertion FAILS.
        Assertions.assertEquals(
                ambiguous.getField("enclosing").schema()
                        .getField("broken").schema(),
                parsed.getField("enclosing").schema()
                        .getField("broken").schema());
    }

The assertion failure message complains:
expected: <{"type":"record","name":"Target","doc":"right","fields":[]}> but was: <{"type":"record","name":"Target","namespace":"org.apache.avro","doc":"wrong","fields":[]}>

Reply via email to