[ 
https://issues.apache.org/jira/browse/AVRO-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733178#comment-17733178
 ] 

Ryan Skraba commented on AVRO-3760:
-----------------------------------

Hello!  I think that your example code might be wrong, but the bug exists 
regardless.

If I understand the intention of AVRO-1340 correctly, the test code must have 
both the reader and writer schema present in order to use the default:

{code}
    reader = DatumReader(future_schema, current_schema)
{code}

In this case you still get an exception: 
**avro.errors.SchemaResolutionException: Symbol crc32_be not present in 
Reader's Schema**

It's a bit vague to me, but my understanding is that the default in an enum is 
meant to serve as the "fail safely" value when a symbol is removed during 
schema evolution, but not when on corrupted or unexpected data (like an enum 
index out of bounds).  Would it suit your needs if the default symbol was only 
used when both schemas are known?

To be clear, the bug is still valid, just the implementation would change.

> Using enum with default symbol, cannot parse future value
> ---------------------------------------------------------
>
>                 Key: AVRO-3760
>                 URL: https://issues.apache.org/jira/browse/AVRO-3760
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 1.11.1
>         Environment: {code}
> $ pip freeze | grep -i avro
> avro==1.11.1
> $ python --version
> Python 3.8.16
> {code}
>            Reporter: Anton Agestam
>            Assignee: Anton Agestam
>            Priority: Major
>             Fix For: 1.11.2
>
>
> It seems like support for default symbols is broken. In the example below, 
> since I'm using default symbols, I expected to be able to add new values to 
> the enum and see the default value when parsing using the old schema.
> {code:python}
> import io
> from avro.io import DatumReader, DatumWriter, BinaryDecoder, BinaryEncoder
> import avro.schema
> current_schema = avro.schema.parse("""
> {
>     "fields": [
>         {
>             "default": "unknown",
>             "name": "checksum_algorithm",
>             "type": {
>                 "name": "ChecksumAlgorithm",
>                 "symbols": [
>                     "unknown",
>                     "xxhash3_64_be"
>                 ],
>                 "type": "enum",
>                 "default": "unknown"
>             }
>         }
>     ],
>     "name": "Metadata",
>     "type": "record"
> }
> """)
> # Future schema adds the "crc32_be" symbol.
> future_schema = avro.schema.parse("""
> {
>     "fields": [
>         {
>             "default": "unknown",
>             "name": "checksum_algorithm",
>             "type": {
>                 "name": "ChecksumAlgorithm",
>                 "symbols": [
>                     "unknown",
>                     "xxhash3_64_be",
>                     "crc32_be"
>                 ],
>                 "type": "enum",
>                 "default": "unknown"
>             }
>         }
>     ],
>     "name": "Metadata",
>     "type": "record"
> }
> """)
> with io.BytesIO() as buffer:
>     writer = DatumWriter(future_schema)
>     encoder = BinaryEncoder(buffer)
>     writer.write({"checksum_algorithm": "crc32_be"}, encoder)
>     buffer.seek(0)
>     reader = DatumReader(current_schema)
>     decoder = BinaryDecoder(buffer)
>     decoded = reader.read(decoder)
> print(decoded)
> {code}
> Instead, this results in an exception:
> {code}
> Traceback (most recent call last):
>   File "reproduce-avro.py", line 58, in <module>
>     decoded = reader.read(decoder)
>   File 
> "/Users/anton/.pyenv/versions/karapace/lib/python3.8/site-packages/avro/io.py",
>  line 649, in read
>     return self.read_data(self.writers_schema, self.readers_schema, decoder)
>   File 
> "/Users/anton/.pyenv/versions/karapace/lib/python3.8/site-packages/avro/io.py",
>  line 727, in read_data
>     return self.read_record(writers_schema, readers_schema, decoder)
>   File 
> "/Users/anton/.pyenv/versions/karapace/lib/python3.8/site-packages/avro/io.py",
>  line 922, in read_record
>     field_val = self.read_data(field.type, readers_field.type, decoder)
>   File 
> "/Users/anton/.pyenv/versions/karapace/lib/python3.8/site-packages/avro/io.py",
>  line 720, in read_data
>     return self.read_enum(writers_schema, readers_schema, decoder)
>   File 
> "/Users/anton/.pyenv/versions/karapace/lib/python3.8/site-packages/avro/io.py",
>  line 779, in read_enum
>     raise avro.errors.SchemaResolutionException(
> avro.errors.SchemaResolutionException: Can't access enum index 2 for enum 
> with 2 symbols
> Writer's Schema: {
>   "type": "enum",
>   "default": "unknown",
>   "name": "ChecksumAlgorithm",
>   "symbols": [
>     "unknown",
>     "xxhash3_64_be"
>   ]
> }
> Reader's Schema: {
>   "type": "enum",
>   "default": "unknown",
>   "name": "ChecksumAlgorithm",
>   "symbols": [
>     "unknown",
>     "xxhash3_64_be"
>   ]
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to