[jira] [Comment Edited] (AVRO-3235) Avro Schema Evolution with Enum – Deserialization Crashes

Uwe Eisele (Jira) Tue, 30 Nov 2021 13:48:08 -0800


    [ 
https://issues.apache.org/jira/browse/AVRO-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451369#comment-17451369
 ]


Uwe Eisele edited comment on AVRO-3235 at 11/30/21, 9:47 PM:
-------------------------------------------------------------

Hello,

I think I have encountered the same issue with Avro 1.11. However, I have 
deliberately defined explicitly different namespaces for the enum.

Schema V1:
{code:json}
{"type":"enum","name":"PersonType","namespace":"person.v1","symbols":["UNDEFINED","CUSTOMER","EMPLOYEE"],"default":"UNDEFINED"}
{code}
Schema V2:
{code:json}
{"type":"enum","name":"PersonType","namespace":"person.v2","symbols":["UNDEFINED","CUSTOMER","EMPLOYEE"],"default":"UNDEFINED"}
{code}
The checkReaderWriterCompatibility method indicates that the schemas are 
compatible.
{code:java}
SchemaCompatibility.SchemaCompatibilityType compatibilityResult =
                SchemaCompatibility.checkReaderWriterCompatibility(SCHEMA_V2, 
SCHEMA_V2).getResult().getCompatibility();
{code}
However, a deserialization of an enum with V2 which was serialized with V1 
fails with the same exception you described.
{code:java}
org.apache.avro.AvroTypeException: Found person.v1.PersonType, expecting 
person.v2.PersonType
{code}
I would understand that deserialization fails because the Avro specification 
([https://avro.apache.org/docs/current/spec.html#names]) states that: "Equality 
of names is defined on the fullname."
However, I would expect that then the compatibility check would call the 
schemas incompatible.

The fact that in your example "Found test.simple.v1.Status" is output, although 
no namespace is specified in schema, is I think due to the fact that the 
namespace of the parent element is used (the namespace is taken from the most 
tightly enclosing schema or protocol).

That the namespace is considered during deserialization is apparently not 
consistent. If a record is used instead of an enum, both the compatibility 
check would mark the schemas as successful and a corresponding deserialization 
would work.
However, this contradicts the specification, since the FullName would have to 
be used. There are already other bug reports about this. AVRO-2793 
(https://issues.apache.org/jira/browse/AVRO-2793) describes why the 
compatibility check does not include the namespace. It refers to a merge 
request ([https://github.com/apache/avro/pull/526/files]) in which FullName was 
changed to Name for the check in order to achieve the same behavior as in older 
versions, although this deviates from the specification.

I can understand this decision, however, I would then expect this behavior for 
Enums as well. I think there should be no difference between different types 
here.

For this reason, if there is no other reason not to do so, I would suggest 
changing the check for enums from FullName to Name during deserialization as 
well.

Current implementation: 
([https://github.com/apache/avro/blob/de50c244c00420825d4bd7d04c0c2d353e439367/lang/java/avro/src/main/java/org/apache/avro/Resolver.java#L391])
{code:java}
public static Action resolve(Schema w, Schema r, GenericData d) {
      if (w.getFullName() != null && !w.getFullName().equals(r.getFullName()))
        return new ErrorAction(w, r, d, ErrorType.NAMES_DONT_MATCH);
{code}
Suggested change, which would allow deserialization with different namespace, 
but same name:
{code:java}
public static Action resolve(Schema w, Schema r, GenericData d) {
      if (w.getName() != null && !w.getName().equals(r.getName()))
        return new ErrorAction(w, r, d, ErrorType.NAMES_DONT_MATCH);
{code}
It looks there is also no test schema for an enum with a namespace 
([https://github.com/apache/avro/blob/de50c244c00420825d4bd7d04c0c2d353e439367/lang/java/avro/src/test/java/org/apache/avro/TestSchemas.java#L47]).

What do you think?

Regards,
Uwe


was (Author: ueisele):
Hello,

I think I have encountered the same issue. However, I have deliberately defined 
explicitly different namespaces for the enum.

Schema V1:
{code:json}
{"type":"enum","name":"PersonType","namespace":"person.v1","symbols":["UNDEFINED","CUSTOMER","EMPLOYEE"],"default":"UNDEFINED"}
{code}
Schema V2:
{code:json}
{"type":"enum","name":"PersonType","namespace":"person.v2","symbols":["UNDEFINED","CUSTOMER","EMPLOYEE"],"default":"UNDEFINED"}
{code}
The checkReaderWriterCompatibility method indicates that the schemas are 
compatible.
{code:java}
SchemaCompatibility.SchemaCompatibilityType compatibilityResult =
                SchemaCompatibility.checkReaderWriterCompatibility(SCHEMA_V2, 
SCHEMA_V2).getResult().getCompatibility();
{code}
However, a deserialization of an enum with V2 which was serialized with V1 
fails with the same exception you described.
{code:java}
org.apache.avro.AvroTypeException: Found person.v1.PersonType, expecting 
person.v2.PersonType
{code}
I would understand that deserialization fails because the Avro specification 
([https://avro.apache.org/docs/current/spec.html#names]) states that: "Equality 
of names is defined on the fullname."
However, I would expect that then the compatibility check would call the 
schemas incompatible.

The fact that in your example "Found test.simple.v1.Status" is output, although 
no namespace is specified in schema, is I think due to the fact that the 
namespace of the parent element is used (the namespace is taken from the most 
tightly enclosing schema or protocol).

That the namespace is considered during deserialization is apparently not 
consistent. If a record is used instead of an enum, both the compatibility 
check would mark the schemas as successful and a corresponding deserialization 
would work.
However, this contradicts the specification, since the FullName would have to 
be used. There are already other bug reports about this. AVRO-2793 
(https://issues.apache.org/jira/browse/AVRO-2793) describes why the 
compatibility check does not include the namespace. It refers to a merge 
request ([https://github.com/apache/avro/pull/526/files]) in which FullName was 
changed to Name for the check in order to achieve the same behavior as in older 
versions, although this deviates from the specification.

I can understand this decision, however, I would then expect this behavior for 
Enums as well. I think there should be no difference between different types 
here.

For this reason, if there is no other reason not to do so, I would suggest 
changing the check for enums from FullName to Name during deserialization as 
well.

Current implementation: 
([https://github.com/apache/avro/blob/de50c244c00420825d4bd7d04c0c2d353e439367/lang/java/avro/src/main/java/org/apache/avro/Resolver.java#L391])
{code:java}
public static Action resolve(Schema w, Schema r, GenericData d) {
      if (w.getFullName() != null && !w.getFullName().equals(r.getFullName()))
        return new ErrorAction(w, r, d, ErrorType.NAMES_DONT_MATCH);
{code}
Suggested change, which would allow deserialization with different namespace, 
but same name:
{code:java}
public static Action resolve(Schema w, Schema r, GenericData d) {
      if (w.getName() != null && !w.getName().equals(r.getName()))
        return new ErrorAction(w, r, d, ErrorType.NAMES_DONT_MATCH);
{code}
It looks there is also no test schema for an enum with a namespace 
(https://github.com/apache/avro/blob/de50c244c00420825d4bd7d04c0c2d353e439367/lang/java/avro/src/test/java/org/apache/avro/TestSchemas.java#L47).

What do you think?

Regards,
Uwe

> Avro Schema Evolution with Enum – Deserialization Crashes
> ---------------------------------------------------------
>
>                 Key: AVRO-3235
>                 URL: https://issues.apache.org/jira/browse/AVRO-3235
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.10.2
>            Reporter: Bertram Beyer
>            Priority: Major
>
> Originally posted on Stack Overflow in June 2020
> https://stackoverflow.com/questions/62596990/avro-schema-evolution-with-enum-deserialization-crashes/
>  
> I defined two versions of a record in two separate AVCS schema files. I used 
> the namespace to distinguish versions *SimpleV1.avsc*
>   
> {code:json}
> {
>   "type" : "record",
>   "name" : "Simple",
>   "namespace" : "test.simple.v1",
>   "fields" : [ 
>       {
>         "name" : "name",
>         "type" : "string"
>       }, 
>       {
>         "name" : "status",
>         "type" : {
>           "type" : "enum",
>           "name" : "Status",
>           "symbols" : [ "ON", "OFF" ]
>         },
>         "default" : "ON"
>       }
>    ]
> }
> {code}
>  
> *Example JSON*
>   
> {code:java}
> {"name":"A","status":"ON"}
> {code}
> Version 2 just has an additional description field with default value.
> *SimpleV2.avsc*
>   
> {code:java}
> {
>   "type" : "record",
>   "name" : "Simple",
>   "namespace" : "test.simple.v2",
>   "fields" : [ 
>       {
>         "name" : "name",
>         "type" : "string"
>       }, 
>       {
>         "name" : "description",
>         "type" : "string",
>         "default" : ""
>       }, 
>       {
>         "name" : "status",
>         "type" : {
>           "type" : "enum",
>           "name" : "Status",
>           "symbols" : [ "ON", "OFF" ]
>         },
>         "default" : "ON"
>       }
>    ]
> }
> {code}
> *Example JSON*
>   
> {code:java}
> {"name":"B","description":"b","status":"ON"}
> {code}
> Both schemas were serialized to Java classes. In my example I was going to 
> test backward compatibility. A record written by V1 shall be read by a reader 
> using V2. I wanted to see that default values are inserted. This is working 
> as long as I do not use enums.
>   
> {code:java}
> public class EnumEvolutionExample {
>     public static void main(String[] args) throws IOException {
>         Schema schemaV1 = new org.apache.avro.Schema.Parser().parse(new 
> File("./src/main/resources/SimpleV1.avsc"));
>         //works as well
>         //Schema schemaV1 = test.simple.v1.Simple.getClassSchema();
>         Schema schemaV2 = new org.apache.avro.Schema.Parser().parse(new 
> File("./src/main/resources/SimpleV2.avsc"));
>         test.simple.v1.Simple simpleV1 = test.simple.v1.Simple.newBuilder()
>                 .setName("A")
>                 .setStatus(test.simple.v1.Status.ON)
>                 .build();
>         
>         
>         SchemaPairCompatibility schemaCompatibility = 
> SchemaCompatibility.checkReaderWriterCompatibility(
>                 schemaV2,
>                 schemaV1);
>         //Checks that writing v1 and reading v2 schemas is compatible
>         Assert.assertEquals(SchemaCompatibilityType.COMPATIBLE, 
> schemaCompatibility.getType());
>         
>         byte[] binaryV1 = serealizeBinary(simpleV1);
>         
>         //Crashes with: AvroTypeException: Found test.simple.v1.Status, 
> expecting test.simple.v2.Status
>         test.simple.v2.Simple v2 = deSerealizeBinary(binaryV1, new 
> test.simple.v2.Simple(), schemaV1);
>         
>     }
>     
>     public static byte[] serealizeBinary(SpecificRecord record) {
>         DatumWriter<SpecificRecord> writer = new 
> SpecificDatumWriter<>(record.getSchema());
>         byte[] data = new byte[0];
>         ByteArrayOutputStream stream = new ByteArrayOutputStream();
>         Encoder binaryEncoder = EncoderFactory.get()
>             .binaryEncoder(stream, null);
>         try {
>             writer.write(record, binaryEncoder);
>             binaryEncoder.flush();
>             data = stream.toByteArray();
>         } catch (IOException e) {
>             System.out.println("Serialization error " + e.getMessage());
>         }
>         return data;
>     }
>     
>     public static <T extends SpecificRecord> T deSerealizeBinary(byte[] data, 
> T reuse, Schema writer) {
>         Decoder decoder = DecoderFactory.get().binaryDecoder(data, null);
>         DatumReader<T> datumReader = new SpecificDatumReader<>(writer, 
> reuse.getSchema());
>         try {
>             T datum = datumReader.read(null, decoder);
>             return datum;
>         } catch (IOException e) {
>             System.out.println("Deserialization error" + e.getMessage());
>         }
>         return null;
>     }
> }
> {code}
> The checkReaderWriterCompatibility method confirms that schemas are 
> compatible. But when I deserialize I’m getting the following exception
>  
> {code:java}
> Exception in thread "main" org.apache.avro.AvroTypeException: Found 
> test.simple.v1.Status, expecting test.simple.v2.Status
>     at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:309)
>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:86)
>     at org.apache.avro.io.ResolvingDecoder.readEnum(ResolvingDecoder.java:260)
>     at 
> org.apache.avro.generic.GenericDatumReader.readEnum(GenericDatumReader.java:267)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:181)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136)
>     at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
>     at 
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>     at 
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
>     at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
>     at 
> test.EnumEvolutionExample.deSerealizeBinary(EnumEvolutionExample.java:70)
>     at test.EnumEvolutionExample.main(EnumEvolutionExample.java:45)
> {code}
>  
> I don’t understand why Avro thinks it got a v1.Status. Namespaces are not 
> part of the encoding.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (AVRO-3235) Avro Schema Evolution with Enum – Deserialization Crashes

Reply via email to