[ 
https://issues.apache.org/jira/browse/AVRO-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentin updated AVRO-3313:
---------------------------
    Description: 
I wanted to use the avro enums and evolve my schema over time by adding the 
values.

>From the doc it says : 
{code:java}
default: A default value for this enumeration, used during resolution when the 
reader encounters a symbol from the writer that isn't defined in the reader's 
schema (optional). The value provided here must be a JSON string that's a 
member of the symbols array. See documentation on schema resolution for how 
this gets used. {code}
 

And the section of the documentation about schema resolution says : 

[https://avro.apache.org/docs/current/spec.html#Schema+Resolution]
{code:java}
if both are enums:
if the writer's symbol is not present in the reader's enum and the reader has a 
default value, then that value is used, otherwise an error is signalled. {code}
This feature has been introduced in avro 1.9.0 with this issue : 

[https://avro.apache.org/docs/current/spec.html#Enums]

 

*However I have found that it doesn't work at all like the specification says.*

Here is an example.

 

If I have a schema used for writing in version 1.

It has two symbols (A and B) and specify to default to symbol A.
{code:java}
{
    "type": "record",
    "name": "RecordA",
    "fields":
    [
        {
            "name": "fieldA",
            "type":
            {
                "type": "enum",
                "name": "Enum1",
                "symbols":
                [
                    "A",
                    "B"
                ]
            },
            "default": "A"
        }
    ]
} {code}
Later when the schema needs a evolvution on the writer, we add a new symbol (C) 
and publish a new schema in version 2.

And the default value is still A.
{code:java}
{
    "type": "record",
    "name": "RecordA",
    "fields":
    [
        {
            "name": "fieldA",
            "type":
            {
                "type": "enum",
                "name": "Enum1",
                "symbols":
                [
                    "A",
                    "B",
                    "C"
                ]
            },
            "default": "A"
        }
    ]
} {code}
According to the documentation on the reader side with the old schema in 
version 1, we should be able to deserialize a payload containing an enum value 
of C that was generated by the writer side with the schema in version 2. Sinc 
the value C is unknown by the reader it should be deserialized as A.

Again as the doc says : 
{code:java}
A default value for this enumeration, used during resolution when the reader 
encounters a symbol from the writer that isn't defined in the reader's schema 
{code}
The issue here is either the documentation is wrong or the avro deserialization 
code is wrong. Since this was an intented feature I assume that this is a bug 
and the code is wrong.

 

I have forked the repository and created a test to demonstrate the issue : 

[https://github.com/idkw/avro/commit/7d36203c137aa6a728d5b85b87969a3f743b45ee]

The test should verify that the reader side using the old schema should 
deserialize the value A when receiving a value C. However it fails with the 
exception `org.apache.avro.AvroTypeException: No match for C`
{code:java}
@Test  public void 
enumRecordWithExtendedSchemaCanBeReadIfNewValuesAreUsedUsingDefault() throws 
Exception {    
  Schema readerSchemaV1 = ENUM_AB_RECORD_DEFAULT_A;    
  Schema writerSchemaV2 = ENUM_ABC_RECORD_DEFAULT_A;    
  Record record = defaultRecordWithSchema(
    writerSchemaV2, 
    FIELD_A, 
    new EnumSymbol(writerSchemaV2, "C")
  );    
  byte[] encoded = encodeGenericBlob(record);    
  Record decodedRecord = decodeGenericBlob(
    readerSchemaV1, 
    writerSchemaV2, 
    encoded
    );    
  Assert.assertEquals("A", decodedRecord.get(FIELD_A).toString());  
} {code}
 

  was:
I wanted to use the avro enums and evolve my schema over time by adding the 
values.

>From the doc it says : 
{code:java}
default: A default value for this enumeration, used during resolution when the 
reader encounters a symbol from the writer that isn't defined in the reader's 
schema (optional). The value provided here must be a JSON string that's a 
member of the symbols array. See documentation on schema resolution for how 
this gets used. {code}
!image-2022-01-19-14-34-52-879.png!

And the section of the documentation about schema resolution says : 

[https://avro.apache.org/docs/current/spec.html#Schema+Resolution]

!image-2022-01-19-15-04-35-442.png!
{code:java}
if both are enums:
if the writer's symbol is not present in the reader's enum and the reader has a 
default value, then that value is used, otherwise an error is signalled. {code}
This feature was supposed to have been introduced since avro 1.9.0 with this 
issue : 

[https://avro.apache.org/docs/current/spec.html#Enums]

 

However I found that it doesn't work at all like the specification says.

Here is an example.

 

If I have a schema used for writing in version 1.

It has two symbols (A and B) and specify to default to symbol A.
{code:java}
{
    "type": "record",
    "name": "RecordA",
    "fields":
    [
        {
            "name": "fieldA",
            "type":
            {
                "type": "enum",
                "name": "Enum1",
                "symbols":
                [
                    "A",
                    "B"
                ]
            },
            "default": "A"
        }
    ]
} {code}
Later when the schema needs a evolvution on the writer, we add a new symbol (C) 
and publish a new schema in version 2.

And the default value is still A.
{code:java}
{
    "type": "record",
    "name": "RecordA",
    "fields":
    [
        {
            "name": "fieldA",
            "type":
            {
                "type": "enum",
                "name": "Enum1",
                "symbols":
                [
                    "A",
                    "B",
                    "C"
                ]
            },
            "default": "A"
        }
    ]
} {code}
According to the documentation on the reader side with the old schema in 
version 1, we should be able to deserialize a payload containing an enum value 
of C that was generated by the writer side with the schema in version 2. Sinc 
the value C is unknown by the reader it should be deserialized as A.

Again as the doc says : 
{code:java}
A default value for this enumeration, used during resolution when the reader 
encounters a symbol from the writer that isn't defined in the reader's schema 
{code}
The issue here is either the documentation is wrong or the avro deserialization 
code is wrong. Since this was an intented feature I assume that this is a bug 
and the code is wrong.

 

I have forked the repository and created a test to demonstrate the issue : 

[https://github.com/idkw/avro/commit/7d36203c137aa6a728d5b85b87969a3f743b45ee]

The test should verify that the reader side using the old schema should 
deserialize the value A when receiving a value C. However it fails with the 
exception `org.apache.avro.AvroTypeException: No match for C`
{code:java}
@Test  public void 
enumRecordWithExtendedSchemaCanBeReadIfNewValuesAreUsedUsingDefault() throws 
Exception {    
  Schema readerSchemaV1 = ENUM_AB_RECORD_DEFAULT_A;    
  Schema writerSchemaV2 = ENUM_ABC_RECORD_DEFAULT_A;    
  Record record = defaultRecordWithSchema(
    writerSchemaV2, 
    FIELD_A, 
    new EnumSymbol(writerSchemaV2, "C")
  );    
  byte[] encoded = encodeGenericBlob(record);    
  Record decodedRecord = decodeGenericBlob(
    readerSchemaV1, 
    writerSchemaV2, 
    encoded
    );    
  Assert.assertEquals("A", decodedRecord.get(FIELD_A).toString());  
} {code}
 


> enum default value to allow deserializer to deserialize to when encountering 
> new enum symbols doesn't work
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-3313
>                 URL: https://issues.apache.org/jira/browse/AVRO-3313
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.9.0, 1.10.0, 1.9.1, 1.9.2, 1.11.0, 1.10.1, 1.10.2
>            Reporter: Valentin
>            Priority: Major
>         Attachments: image-2022-01-19-14-34-52-879.png, 
> image-2022-01-19-15-04-35-442.png
>
>
> I wanted to use the avro enums and evolve my schema over time by adding the 
> values.
> From the doc it says : 
> {code:java}
> default: A default value for this enumeration, used during resolution when 
> the reader encounters a symbol from the writer that isn't defined in the 
> reader's schema (optional). The value provided here must be a JSON string 
> that's a member of the symbols array. See documentation on schema resolution 
> for how this gets used. {code}
>  
> And the section of the documentation about schema resolution says : 
> [https://avro.apache.org/docs/current/spec.html#Schema+Resolution]
> {code:java}
> if both are enums:
> if the writer's symbol is not present in the reader's enum and the reader has 
> a default value, then that value is used, otherwise an error is signalled. 
> {code}
> This feature has been introduced in avro 1.9.0 with this issue : 
> [https://avro.apache.org/docs/current/spec.html#Enums]
>  
> *However I have found that it doesn't work at all like the specification 
> says.*
> Here is an example.
>  
> If I have a schema used for writing in version 1.
> It has two symbols (A and B) and specify to default to symbol A.
> {code:java}
> {
>     "type": "record",
>     "name": "RecordA",
>     "fields":
>     [
>         {
>             "name": "fieldA",
>             "type":
>             {
>                 "type": "enum",
>                 "name": "Enum1",
>                 "symbols":
>                 [
>                     "A",
>                     "B"
>                 ]
>             },
>             "default": "A"
>         }
>     ]
> } {code}
> Later when the schema needs a evolvution on the writer, we add a new symbol 
> (C) and publish a new schema in version 2.
> And the default value is still A.
> {code:java}
> {
>     "type": "record",
>     "name": "RecordA",
>     "fields":
>     [
>         {
>             "name": "fieldA",
>             "type":
>             {
>                 "type": "enum",
>                 "name": "Enum1",
>                 "symbols":
>                 [
>                     "A",
>                     "B",
>                     "C"
>                 ]
>             },
>             "default": "A"
>         }
>     ]
> } {code}
> According to the documentation on the reader side with the old schema in 
> version 1, we should be able to deserialize a payload containing an enum 
> value of C that was generated by the writer side with the schema in version 
> 2. Sinc the value C is unknown by the reader it should be deserialized as A.
> Again as the doc says : 
> {code:java}
> A default value for this enumeration, used during resolution when the reader 
> encounters a symbol from the writer that isn't defined in the reader's schema 
> {code}
> The issue here is either the documentation is wrong or the avro 
> deserialization code is wrong. Since this was an intented feature I assume 
> that this is a bug and the code is wrong.
>  
> I have forked the repository and created a test to demonstrate the issue : 
> [https://github.com/idkw/avro/commit/7d36203c137aa6a728d5b85b87969a3f743b45ee]
> The test should verify that the reader side using the old schema should 
> deserialize the value A when receiving a value C. However it fails with the 
> exception `org.apache.avro.AvroTypeException: No match for C`
> {code:java}
> @Test  public void 
> enumRecordWithExtendedSchemaCanBeReadIfNewValuesAreUsedUsingDefault() throws 
> Exception {    
>   Schema readerSchemaV1 = ENUM_AB_RECORD_DEFAULT_A;    
>   Schema writerSchemaV2 = ENUM_ABC_RECORD_DEFAULT_A;    
>   Record record = defaultRecordWithSchema(
>     writerSchemaV2, 
>     FIELD_A, 
>     new EnumSymbol(writerSchemaV2, "C")
>   );    
>   byte[] encoded = encodeGenericBlob(record);    
>   Record decodedRecord = decodeGenericBlob(
>     readerSchemaV1, 
>     writerSchemaV2, 
>     encoded
>     );    
>   Assert.assertEquals("A", decodedRecord.get(FIELD_A).toString());  
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to