[
https://issues.apache.org/jira/browse/AVRO-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Scott resolved AVRO-3029.
-------------------------
Resolution: Not A Problem
> Specification is a little ambiguous about where enum defaults should be
> defined which might be causing library differences
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: AVRO-3029
> URL: https://issues.apache.org/jira/browse/AVRO-3029
> Project: Apache Avro
> Issue Type: Improvement
> Components: java, python, ruby
> Affects Versions: 1.10.1
> Reporter: Scott
> Priority: Major
>
> In the specification, an enum type can have a `default` attribute. At the
> same time, each field in a record can have a default. On top of that, the
> chart of example default values for fields includes enum in the example.
> So, if I want to define a record with a enum field, where would I put the
> default? Do I define it like this:
> {code:java}
> {
> "type": "record",
> "name": "test",
> "fields": [
> {
> "name": "enum",
> "type": {
> "type": "enum",
> "name": "enum_field",
> "symbols": ["FOO", "BAR"],
> },
> "default": "FOO",
> },
> ],
> }
> {code}
> Or like this:
> {code:java}
> {
> "type": "record",
> "name": "test",
> "fields": [
> {
> "name": "enum",
> "type": {
> "type": "enum",
> "name": "enum_field",
> "symbols": ["FOO", "BAR"],
> "default": "FOO",
> },
> },
> ],
> }
> {code}
> I was confused, so I started looking for examples, but it seems like I'm not
> the only one confused about this because [this
> stackoverflow|https://stackoverflow.com/questions/62596990/avro-schema-evolution-with-enum-deserialization-crashes]
> and https://issues.apache.org/jira/browse/AVRO-2518 put the default at the
> field level whereas https://issues.apache.org/jira/browse/AVRO-2879 puts the
> default at the enum level.
> So then I started looking at examples in the codebase. It seems like there's
> a [ruby test
> case|https://github.com/apache/avro/blob/7d1e63b219e6d0778bc57195152477adee97fcab/lang/ruby/test/test_schema.rb#L333-L338]
> and [java test
> case|https://github.com/apache/avro/blob/7d1e63b219e6d0778bc57195152477adee97fcab/lang/java/avro/src/test/java/org/apache/avro/FooBarSpecificRecord.java#L34]
> that put the default at the enum level.
> Okay, solved, right? Since the test cases have the default at the enum level,
> that's where it should be... but then I tried to create a simple python
> script (since I'm a python user) to double check this, and it seems like the
> python library disagrees. Here's the example script that uses the default at
> the enum level:
> {code:java}
> import json
> from io import BytesIO
> import avro.schema
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
> writer_schema = {
> "type": "record",
> "name": "test",
> "fields": [
> {
> "name": "foo",
> "type": "string"
> }
> ],
> }
> reader_schema = {
> "type": "record",
> "name": "test",
> "fields": [
> {
> "name": "foo",
> "type": "string"
> },
> {
> "name": "enum",
> "type": {
> "type": "enum",
> "name": "enum_field",
> "symbols": ["FOO", "BAR"],
> "default": "FOO",
> },
> },
> ],
> }
> w_schema = avro.schema.parse(json.dumps(writer_schema))
> r_schema = avro.schema.parse(json.dumps(reader_schema))
> bio = BytesIO()
> writer = DataFileWriter(bio, DatumWriter(), w_schema)
> writer.append({"foo": "bar"})
> writer.flush()
> bio.seek(0)
> reader = DataFileReader(bio, DatumReader(w_schema, r_schema))
> for record in reader:
> print(record)
> {code}
> But when I run that, I get an exception:
> {code:java}
> avro.io.SchemaResolutionException: No default value for field enum
> Writer's Schema: {
> "type": "record",
> "name": "test",
> "fields": [
> {
> "type": "string",
> "name": "foo"
> }
> ]
> }
> Reader's Schema: {
> "type": "record",
> "name": "test",
> "fields": [
> {
> "type": "string",
> "name": "foo"
> },
> {
> "type": {
> "type": "enum",
> "default": "FOO",
> "name": "enum_field",
> "symbols": [
> "FOO",
> "BAR"
> ]
> },
> "name": "enum"
> }
> ]
> }
> {code}
> And if I change the script to use a reader_schema that has the default on the
> field level like this:
> {code:java}
> reader_schema = {
> "type": "record",
> "name": "test",
> "fields": [
> {
> "name": "foo",
> "type": "string"
> },
> {
> "name": "enum",
> "type": {
> "type": "enum",
> "name": "enum_field",
> "symbols": ["FOO", "BAR"],
> },
> "default": "FOO",
> },
> ],
> }
> {code}
> Then it works and prints out the record with the default value for the enum:
> {code:java}
> {'foo': 'bar', 'enum': 'FOO'}
> {code}
> I don't have a Java environment set up to try to run the same type of script
> in Java to verify that implementation, but I would assume based on the test
> case that it works exactly the opposite and expects the default at the enum
> level.
> I think making the libraries consistent could cause massive breakages for
> whichever library doesn't currently conform to what the specification should
> be (which I'm honestly not sure based on how the spec is currently written).
> Therefore, I think it might be easiest to allow an enum's default to be
> defined at either the field level or the enum level. I maintain the
> `fastavro` library and the behavior there is the same as the avro python
> implementation and I would hate to have to force a massive breaking change
> like this on the users if the specification is updated to say that enum
> default values have to be defined at the enum level rather than the field
> level.
> Please let me know your thoughts and thank you for taking the time to read
> this lengthy message.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)