Anton Agestam created AVRO-3760:
-----------------------------------

             Summary: Using enum with default symbol, cannot parse future value
                 Key: AVRO-3760
                 URL: https://issues.apache.org/jira/browse/AVRO-3760
             Project: Apache Avro
          Issue Type: Bug
          Components: python
    Affects Versions: 1.11.1
         Environment: {code:shell}
$ pip freeze | grep -i avro
avro==1.11.1
$ python --version
Python 3.8.16
{code}
            Reporter: Anton Agestam


It seems like support for default symbols is broken. In the example below, 
since I'm using default symbols, I expected to be able to add new values to the 
enum and see the default value when parsing using the old schema.

{code:python}
import io
from avro.io import DatumReader, DatumWriter, BinaryDecoder, BinaryEncoder
import avro.schema

current_schema = avro.schema.parse("""
{
    "fields": [
        {
            "default": "unknown",
            "name": "checksum_algorithm",
            "type": {
                "name": "ChecksumAlgorithm",
                "symbols": [
                    "unknown",
                    "xxhash3_64_be"
                ],
                "type": "enum",
                "default": "unknown"
            }
        }
    ],
    "name": "Metadata",
    "type": "record"
}
""")

# Future schema adds the "crc32_be" symbol.
future_schema = avro.schema.parse("""
{
    "fields": [
        {
            "default": "unknown",
            "name": "checksum_algorithm",
            "type": {
                "name": "ChecksumAlgorithm",
                "symbols": [
                    "unknown",
                    "xxhash3_64_be",
                    "crc32_be"
                ],
                "type": "enum",
                "default": "unknown"
            }
        }
    ],
    "name": "Metadata",
    "type": "record"
}
""")


with io.BytesIO() as buffer:
    writer = DatumWriter(future_schema)
    encoder = BinaryEncoder(buffer)
    writer.write({"checksum_algorithm": "crc32_be"}, encoder)
    buffer.seek(0)

    reader = DatumReader(current_schema)
    decoder = BinaryDecoder(buffer)
    decoded = reader.read(decoder)

print(decoded)
{code}

Instead, this results in an exception:

{code:java}
Traceback (most recent call last):
  File "reproduce-avro.py", line 58, in <module>
    decoded = reader.read(decoder)
  File 
"/Users/anton/.pyenv/versions/karapace/lib/python3.8/site-packages/avro/io.py", 
line 649, in read
    return self.read_data(self.writers_schema, self.readers_schema, decoder)
  File 
"/Users/anton/.pyenv/versions/karapace/lib/python3.8/site-packages/avro/io.py", 
line 727, in read_data
    return self.read_record(writers_schema, readers_schema, decoder)
  File 
"/Users/anton/.pyenv/versions/karapace/lib/python3.8/site-packages/avro/io.py", 
line 922, in read_record
    field_val = self.read_data(field.type, readers_field.type, decoder)
  File 
"/Users/anton/.pyenv/versions/karapace/lib/python3.8/site-packages/avro/io.py", 
line 720, in read_data
    return self.read_enum(writers_schema, readers_schema, decoder)
  File 
"/Users/anton/.pyenv/versions/karapace/lib/python3.8/site-packages/avro/io.py", 
line 779, in read_enum
    raise avro.errors.SchemaResolutionException(
avro.errors.SchemaResolutionException: Can't access enum index 2 for enum with 
2 symbols
Writer's Schema: {
  "type": "enum",
  "default": "unknown",
  "name": "ChecksumAlgorithm",
  "symbols": [
    "unknown",
    "xxhash3_64_be"
  ]
}
Reader's Schema: {
  "type": "enum",
  "default": "unknown",
  "name": "ChecksumAlgorithm",
  "symbols": [
    "unknown",
    "xxhash3_64_be"
  ]
}
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to