[jira] [Commented] (AVRO-2002) Canonical form strip the default value : Schema resolution may provide 2 different answers with same schema's fingerprint

Doug Cutting (JIRA) Thu, 16 Feb 2017 09:57:04 -0800

    [ 
https://issues.apache.org/jira/browse/AVRO-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870374#comment-15870374
 ]


Doug Cutting commented on AVRO-2002:
------------------------------------

I believe you have misunderstood the semantics of fingerprints.  Identical 
fingerprints mean that one schema can read output of the other without schema 
resolution, not that both can read a third using schema resolution.

Schema resolution permits interoperability (in some cases) between a pair of 
schemas whose fingerprints do not match.  The SchemaCompatiblity class can 
determine whether a pair of schemas can, through resolution, interoperate.

> Canonical form strip the default value : Schema resolution may provide 2 
> different answers with same schema's fingerprint
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-2002
>                 URL: https://issues.apache.org/jira/browse/AVRO-2002
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.8.1
>            Reporter: Deslandes Hugues
>
> I understand that the schema‘s fingerprint describes uniquely the Avro 
> Schema. The following example shows 2 different schemas, with the same 
> fingerprint but different behaviours: one can read the writer, the other one 
> can’t. I guess it is a bug but maybe it's only a misinterpretation…  
> Here are the details : 
> First, the Canonical form of an Avro Schema is derived using this rule: (see 
> http://avro.apache.org/docs/1.8.1/spec.html#Transforming+into+Parsing+Canonical+Form
>   )
> {quote}
> [STRIP] Keep only attributes that are relevant to parsing data, which are: 
> type, name, fields, symbols, items, values, size. Strip all others (e.g., doc 
> and aliases). {quote}  
> So any default attribute is removed.
> On the other hand, Schema Resolution is done using this particular rule: 
> (http://avro.apache.org/docs/1.8.1/spec.html#Schema+Resolution  )
> {quote}if the reader's record schema has a field with no default value, and 
> writer's schema does not have a field with the same name, an error is 
> signalled.{quote}
> To illustrate the situation on a simple schema (writer), I have created a new 
> version by adding a new field to the schema with 2 options: one has a default 
> attribute and value, the other one hasn’t.  The first one can read old 
> version of writer, the second one can’t.
> In other words, the canonical form does not take into account any default 
> attribute for the record fields but the resolution algorithm uses the default 
> attribute to evaluate the compatibility. The conclusion is that 2 schemas 
> that differ only with a default attribute have the same finger print: one is 
> compatible with the writer schema, the other one is not.
> I understand the different behaviors but not with the same fingerprint.
> I would suggest that the canonical form would not strip the default attribute 
> (but strip the default value which should not interfere with the 
> compatibility).
> The immediate workaround I will use is to systematically use a default value 
> for any additional field.
> {code:linenumbers=true|language=java}
> package Main;
> import java.util.Collections;
> import org.apache.avro.Schema;
> import org.apache.avro.SchemaCompatibility;
> import org.apache.avro.SchemaNormalization;
> import org.apache.avro.SchemaValidationException;
> import org.apache.avro.SchemaValidator;
> import org.apache.avro.SchemaValidatorBuilder;
> public class Main {
>       public static void main(String[] args) {
>               Schema schemaWriter = new org.apache.avro.Schema.Parser().parse(
>                               
> "{\"type\":\"record\",\"name\":\"ExampleAvro\",\"fields\":[{\"name\":\"field\",\"type\":\"long\"}]}");
>               Schema schemaReader = new org.apache.avro.Schema.Parser().parse(
>                               
> "{\"type\":\"record\",\"name\":\"ExampleAvro\",\"fields\":[{\"name\":\"field\",\"type\":\"long\"},{\"name\":\"newField\",\"type\":\"int\",\"default\":0}]}");
>               Schema schemaReaderNoDefault = new 
> org.apache.avro.Schema.Parser().parse(
>                               
> "{\"type\":\"record\",\"name\":\"ExampleAvro\",\"fields\":[{\"name\":\"field\",\"type\":\"long\"},{\"name\":\"newField\",\"type\":\"int\"}]}");
>               long fpWriter = 
> SchemaNormalization.parsingFingerprint64(schemaWriter);
>               long fpReader = 
> SchemaNormalization.parsingFingerprint64(schemaReader);
>               long fpReaderNoDefault = 
> SchemaNormalization.parsingFingerprint64(schemaReaderNoDefault);
>               
>               System.out.println("Schema writer          " + fpWriter + " "+ 
> schemaWriter);
>               System.out.println("Schema reader          " + fpReader + " "+ 
> schemaReader);
>               System.out.println("Schema readerNoDefault " + 
> fpReaderNoDefault + " "+ schemaReaderNoDefault);
>               // check compatibility : method 1
>               String res = 
> SchemaCompatibility.checkReaderWriterCompatibility(schemaReader, 
> schemaWriter).getType().toString() ;
>               String resNoDefault = 
> SchemaCompatibility.checkReaderWriterCompatibility(schemaReaderNoDefault, 
> schemaWriter).getType().toString() ;
>               
>               System.out.println(fpReader + " is " + res +  " with " 
> +fpWriter);
>               System.out.println(fpReaderNoDefault + " is " + resNoDefault +  
> " with " +fpWriter);
>               // check compatibility : method 2 
>               SchemaValidator validator = new 
> SchemaValidatorBuilder().canReadStrategy().validateAll();
>               String isCompatible="";
>               try {
>                       validator.validate(schemaReaderNoDefault,  
> Collections.singletonList(schemaWriter));
>               } catch (SchemaValidationException e) {
>                       isCompatible="not ";
>               }       
>               System.out.println(fpReaderNoDefault + " is "+ isCompatible 
> +"compatible with " +fpWriter);
>               isCompatible="";
>               try {
>                       validator.validate(schemaReader,  
> Collections.singletonList(schemaWriter));
>               } catch (SchemaValidationException e) {
>                       isCompatible="not ";
>               }       
>               System.out.println(fpReader + " is "+ isCompatible +"compatible 
> with " +fpWriter);
>               System.out.println("------------");
>       }
>       //The output is :
>       //Schema writer          8957007963871099370 
> {"type":"record","name":"ExampleAvro","fields":[{"name":"field","type":"long"}]}
>       //Schema reader          489516346825099350 
> {"type":"record","name":"ExampleAvro","fields":[{"name":"field","type":"long"},{"name":"newField","type":"int","default":0}]}
>       //Schema readerNoDefault 489516346825099350 
> {"type":"record","name":"ExampleAvro","fields":[{"name":"field","type":"long"},{"name":"newField","type":"int"}]}
>       //489516346825099350 is COMPATIBLE with 8957007963871099370
>       //489516346825099350 is INCOMPATIBLE with 8957007963871099370
>       //489516346825099350 is not compatible with 8957007963871099370
>       //489516346825099350 is compatible with 8957007963871099370
>       
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (AVRO-2002) Canonical form strip the default value : Schema resolution may provide 2 different answers with same schema's fingerprint

Reply via email to