DESLANDES created AVRO-2002:
-------------------------------
Summary: Canonical form strip the default value : Schema
resolution may provide 2 different answers with same schema's fingerprint
Key: AVRO-2002
URL: https://issues.apache.org/jira/browse/AVRO-2002
Project: Avro
Issue Type: Bug
Components: java
Affects Versions: 1.8.1
Reporter: DESLANDES
I understand that the schema‘s fingerprint describes uniquely the Avro Schema.
The following example shows 2 different schemas, with the same fingerprint but
different behaviours: one can read the writer, the other one can’t. I guess it
is a bug but maybe it's only a misinterpretation…
Here are the details :
First, the Canonical form of an Avro Schema is derived using this rule: (see
http://avro.apache.org/docs/1.8.1/spec.html#Transforming+into+Parsing+Canonical+Form
)
{quote}
[STRIP] Keep only attributes that are relevant to parsing data, which are:
type, name, fields, symbols, items, values, size. Strip all others (e.g., doc
and aliases). {quote}
So any default attribute is removed.
On the other hand, Schema Resolution is done using this particular rule:
(http://avro.apache.org/docs/1.8.1/spec.html#Schema+Resolution )
{quote}if the reader's record schema has a field with no default value, and
writer's schema does not have a field with the same name, an error is
signalled.{quote}
To illustrate the situation on a simple schema (writer), I have created a new
version by adding a new field to the schema with 2 options: one has a default
attribute and value, the other one hasn’t. The first one can read old version
of writer, the second one can’t.
In other words, the canonical form does not take into account any default
attribute for the record fields but the resolution algorithm uses the default
attribute to evaluate the compatibility. The conclusion is that 2 schemas that
differ only with a default attribute have the same finger print: one is
compatible with the writer schema, the other one is not.
I understand the different behaviors but not with the same fingerprint.
I would suggest that the canonical form would not strip the default attribute
(but strip the default value which should not interfere with the compatibility).
The immediate workaround I will use is to systematically use a default value
for any additional field.
{code:linenumbers=true|language=java}
package Main;
import java.util.Collections;
import org.apache.avro.Schema;
import org.apache.avro.SchemaCompatibility;
import org.apache.avro.SchemaNormalization;
import org.apache.avro.SchemaValidationException;
import org.apache.avro.SchemaValidator;
import org.apache.avro.SchemaValidatorBuilder;
public class Main {
public static void main(String[] args) {
Schema schemaWriter = new org.apache.avro.Schema.Parser().parse(
"{\"type\":\"record\",\"name\":\"ExampleAvro\",\"fields\":[{\"name\":\"field\",\"type\":\"long\"}]}");
Schema schemaReader = new org.apache.avro.Schema.Parser().parse(
"{\"type\":\"record\",\"name\":\"ExampleAvro\",\"fields\":[{\"name\":\"field\",\"type\":\"long\"},{\"name\":\"newField\",\"type\":\"int\",\"default\":0}]}");
Schema schemaReaderNoDefault = new
org.apache.avro.Schema.Parser().parse(
"{\"type\":\"record\",\"name\":\"ExampleAvro\",\"fields\":[{\"name\":\"field\",\"type\":\"long\"},{\"name\":\"newField\",\"type\":\"int\"}]}");
long fpWriter =
SchemaNormalization.parsingFingerprint64(schemaWriter);
long fpReader =
SchemaNormalization.parsingFingerprint64(schemaReader);
long fpReaderNoDefault =
SchemaNormalization.parsingFingerprint64(schemaReaderNoDefault);
System.out.println("Schema writer " + fpWriter + " "+
schemaWriter);
System.out.println("Schema reader " + fpReader + " "+
schemaReader);
System.out.println("Schema readerNoDefault " +
fpReaderNoDefault + " "+ schemaReaderNoDefault);
// check compatibility : method 1
String res =
SchemaCompatibility.checkReaderWriterCompatibility(schemaReader,
schemaWriter).getType().toString() ;
String resNoDefault =
SchemaCompatibility.checkReaderWriterCompatibility(schemaReaderNoDefault,
schemaWriter).getType().toString() ;
System.out.println(fpReader + " is " + res + " with "
+fpWriter);
System.out.println(fpReaderNoDefault + " is " + resNoDefault +
" with " +fpWriter);
// check compatibility : method 2
SchemaValidator validator = new
SchemaValidatorBuilder().canReadStrategy().validateAll();
String isCompatible="";
try {
validator.validate(schemaReaderNoDefault,
Collections.singletonList(schemaWriter));
} catch (SchemaValidationException e) {
isCompatible="not ";
}
System.out.println(fpReaderNoDefault + " is "+ isCompatible
+"compatible with " +fpWriter);
isCompatible="";
try {
validator.validate(schemaReader,
Collections.singletonList(schemaWriter));
} catch (SchemaValidationException e) {
isCompatible="not ";
}
System.out.println(fpReader + " is "+ isCompatible +"compatible
with " +fpWriter);
System.out.println("------------");
}
//The output is :
//Schema writer 8957007963871099370
{"type":"record","name":"ExampleAvro","fields":[{"name":"field","type":"long"}]}
//Schema reader 489516346825099350
{"type":"record","name":"ExampleAvro","fields":[{"name":"field","type":"long"},{"name":"newField","type":"int","default":0}]}
//Schema readerNoDefault 489516346825099350
{"type":"record","name":"ExampleAvro","fields":[{"name":"field","type":"long"},{"name":"newField","type":"int"}]}
//489516346825099350 is COMPATIBLE with 8957007963871099370
//489516346825099350 is INCOMPATIBLE with 8957007963871099370
//489516346825099350 is not compatible with 8957007963871099370
//489516346825099350 is compatible with 8957007963871099370
}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)