[ 
https://issues.apache.org/jira/browse/AVRO-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott resolved AVRO-3029.
-------------------------
    Resolution: Not A Problem

> Specification is a little ambiguous about where enum defaults should be 
> defined which might be causing library differences
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-3029
>                 URL: https://issues.apache.org/jira/browse/AVRO-3029
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java, python, ruby
>    Affects Versions: 1.10.1
>            Reporter: Scott
>            Priority: Major
>
> In the specification, an enum type can have a `default` attribute. At the 
> same time, each field in a record can have a default. On top of that, the 
> chart of example default values for fields includes enum in the example.
> So, if I want to define a record with a enum field, where would I put the 
> default? Do I define it like this:
> {code:java}
> {
>     "type": "record",
>     "name": "test",
>     "fields": [
>         {
>             "name": "enum",
>             "type": {
>                 "type": "enum",
>                 "name": "enum_field",
>                 "symbols": ["FOO", "BAR"],
>             },
>             "default": "FOO",
>         },
>     ],
> }
> {code}
> Or like this:
> {code:java}
> {
>     "type": "record",
>     "name": "test",
>     "fields": [
>         {
>             "name": "enum",
>             "type": {
>                 "type": "enum",
>                 "name": "enum_field",
>                 "symbols": ["FOO", "BAR"],
>                 "default": "FOO",
>             },
>         },
>     ],
> }
> {code}
> I was confused, so I started looking for examples, but it seems like I'm not 
> the only one confused about this because [this 
> stackoverflow|https://stackoverflow.com/questions/62596990/avro-schema-evolution-with-enum-deserialization-crashes]
>  and https://issues.apache.org/jira/browse/AVRO-2518 put the default at the 
> field level whereas https://issues.apache.org/jira/browse/AVRO-2879 puts the 
> default at the enum level.
> So then I started looking at examples in the codebase. It seems like there's 
> a [ruby test 
> case|https://github.com/apache/avro/blob/7d1e63b219e6d0778bc57195152477adee97fcab/lang/ruby/test/test_schema.rb#L333-L338]
>  and [java test 
> case|https://github.com/apache/avro/blob/7d1e63b219e6d0778bc57195152477adee97fcab/lang/java/avro/src/test/java/org/apache/avro/FooBarSpecificRecord.java#L34]
>  that put the default at the enum level.
> Okay, solved, right? Since the test cases have the default at the enum level, 
> that's where it should be... but then I tried to create a simple python 
> script (since I'm a python user) to double check this, and it seems like the 
> python library disagrees. Here's the example script that uses the default at 
> the enum level:
> {code:java}
> import json
> from io import BytesIO
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> writer_schema = {
>     "type": "record",
>     "name": "test",
>     "fields": [
>         {
>             "name": "foo",
>             "type": "string"
>         }
>     ],
> }
> reader_schema = {
>     "type": "record",
>     "name": "test",
>     "fields": [
>         {
>             "name": "foo",
>             "type": "string"
>         },
>         {
>             "name": "enum",
>             "type": {
>                 "type": "enum",
>                 "name": "enum_field",
>                 "symbols": ["FOO", "BAR"],
>                 "default": "FOO",
>             },
>         },
>     ],
> }
> w_schema = avro.schema.parse(json.dumps(writer_schema))
> r_schema = avro.schema.parse(json.dumps(reader_schema))
> bio = BytesIO()
> writer = DataFileWriter(bio, DatumWriter(), w_schema)
> writer.append({"foo": "bar"})
> writer.flush()
> bio.seek(0)
> reader = DataFileReader(bio, DatumReader(w_schema, r_schema))
> for record in reader:
>     print(record)
> {code}
> But when I run that, I get an exception:
> {code:java}
> avro.io.SchemaResolutionException: No default value for field enum
> Writer's Schema: {
>   "type": "record",
>   "name": "test",
>   "fields": [
>     {
>       "type": "string",
>       "name": "foo"
>     }
>   ]
> }
> Reader's Schema: {
>   "type": "record",
>   "name": "test",
>   "fields": [
>     {
>       "type": "string",
>       "name": "foo"
>     },
>     {
>       "type": {
>         "type": "enum",
>         "default": "FOO",
>         "name": "enum_field",
>         "symbols": [
>           "FOO",
>           "BAR"
>         ]
>       },
>       "name": "enum"
>     }
>   ]
> }
> {code}
> And if I change the script to use a reader_schema that has the default on the 
> field level like this:
> {code:java}
> reader_schema = {
>     "type": "record",
>     "name": "test",
>     "fields": [
>         {
>             "name": "foo",
>             "type": "string"
>         },
>         {
>             "name": "enum",
>             "type": {
>                 "type": "enum",
>                 "name": "enum_field",
>                 "symbols": ["FOO", "BAR"],
>             },
>             "default": "FOO",
>         },
>     ],
> }
> {code}
> Then it works and prints out the record with the default value for the enum:
> {code:java}
> {'foo': 'bar', 'enum': 'FOO'}
> {code}
> I don't have a Java environment set up to try to run the same type of script 
> in Java to verify that implementation, but I would assume based on the test 
> case that it works exactly the opposite and expects the default at the enum 
> level.
> I think making the libraries consistent could cause massive breakages for 
> whichever library doesn't currently conform to what the specification should 
> be (which I'm honestly not sure based on how the spec is currently written). 
> Therefore, I think it might be easiest to allow an enum's default to be 
> defined at either the field level or the enum level. I maintain the 
> `fastavro` library and the behavior there is the same as the avro python 
> implementation and I would hate to have to force a massive breaking change 
> like this on the users if the specification is updated to say that enum 
> default values have to be defined at the enum level rather than the field 
> level.
> Please let me know your thoughts and thank you for taking the time to read 
> this lengthy message.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to