[
https://issues.apache.org/jira/browse/AVRO-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17648414#comment-17648414
]
Oscar Westra van Holthe - Kind commented on AVRO-3313:
------------------------------------------------------
The PR does not address the bug in the test case (mentioned by Christophe): the
default value is defined on the field instead of the enum.
Worse, the PR introduces a dead branch (one that will not be used at runtime),
unless the writer schema is actually wrong (which is a user bug). And in case
of such a bug, it breaks the data even further by filling in a default value
for invalid (not unknown!) enum values.
I agree with the sentiment: you want the record to deserialise with the enum
value "A" if the enum value is missing.
You can accomplish that by adjusting your schemas like this (note the moved
"default" property):
{code:java}
{
"type": "record",
"name": "RecordA",
"fields":
[
{
"name": "fieldA",
"type":
{
"type": "enum",
"name": "Enum1",
"default": "A",
"symbols":
[
"A",
"B",
"C"
]
}
}
]
} {code}
This becomes more apparent using the IDL syntax (which also supports comments
in addition to documentation). This is the equivalent of your schema version 1:
{code:java}
record RecordA {
// A is the default value for the field when writing or if the field missing
from the writer schema
Enum1 fieldA = "A";
}
enum Enum1 { A, B } {code}
And this is the equivalent of the fixed version in this comment:
{code:java}
record RecordA {
Enum1 fieldA;
}
// A is the default value for the enum when the writer schema contains an
unknown enum value (e.g. C)
enum Enum1 { A, B } = A;{code}
(note that this should be placed inside a dummy protocol; the code blocks above
are fully equivalent when the PRs for AVRO-3404 and AVRO-3666 are merged)
> enum default value to allow deserializer to deserialize to when encountering
> new enum symbols doesn't work
> ----------------------------------------------------------------------------------------------------------
>
> Key: AVRO-3313
> URL: https://issues.apache.org/jira/browse/AVRO-3313
> Project: Apache Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.9.0, 1.10.0, 1.9.1, 1.9.2, 1.11.0, 1.10.1, 1.10.2
> Reporter: Valentin
> Priority: Major
> Labels: avro, enum, java, schema-evolution
>
> I wanted to use the avro enums and evolve my schema over time by adding the
> values.
> From the doc it says :
> {code:java}
> default: A default value for this enumeration, used during resolution when
> the reader encounters a symbol from the writer that isn't defined in the
> reader's schema (optional). The value provided here must be a JSON string
> that's a member of the symbols array. See documentation on schema resolution
> for how this gets used. {code}
>
> And the section of the documentation about schema resolution says :
> [https://avro.apache.org/docs/current/spec.html#Schema+Resolution]
> {code:java}
> if both are enums:
> if the writer's symbol is not present in the reader's enum and the reader has
> a default value, then that value is used, otherwise an error is signalled.
> {code}
> This feature has been introduced in avro 1.9.0 with this issue :
> [https://avro.apache.org/docs/current/spec.html#Enums]
>
> *However I have found that it doesn't work at all like the specification
> says.*
> Here is an example.
>
> If I have a schema used for writing in version 1.
> It has two symbols (A and B) and specify to default to symbol A.
> {code:java}
> {
> "type": "record",
> "name": "RecordA",
> "fields":
> [
> {
> "name": "fieldA",
> "type":
> {
> "type": "enum",
> "name": "Enum1",
> "symbols":
> [
> "A",
> "B"
> ]
> },
> "default": "A"
> }
> ]
> } {code}
> Later when the schema needs a evolution on the writer, we add a new symbol
> (C) and publish a new schema in version 2.
> And the default value is still A.
> {code:java}
> {
> "type": "record",
> "name": "RecordA",
> "fields":
> [
> {
> "name": "fieldA",
> "type":
> {
> "type": "enum",
> "name": "Enum1",
> "symbols":
> [
> "A",
> "B",
> "C"
> ]
> },
> "default": "A"
> }
> ]
> } {code}
> According to the documentation on the reader side with the old schema in
> version 1, we should be able to deserialize a payload containing an enum
> value of C that was generated by the writer side with the schema in version
> 2. Since the value C is unknown by the reader it should be deserialized as A.
> Again, the doc says :
> {code:java}
> A default value for this enumeration, used during resolution when the reader
> encounters a symbol from the writer that isn't defined in the reader's schema
> {code}
> The issue here is either the documentation is wrong or the avro
> deserialization code is wrong. Since this was an intended feature I assume
> that this is a bug and the code is wrong.
>
> I have forked the repository and created a test to demonstrate the issue :
> [https://github.com/idkw/avro/commit/7d36203c137aa6a728d5b85b87969a3f743b45ee]
> The test should verify that the reader side using the old schema should
> deserialize the value A when receiving a value C. However it fails with the
> exception `org.apache.avro.AvroTypeException: No match for C`
> {code:java}
> @Test public void
> enumRecordWithExtendedSchemaCanBeReadIfNewValuesAreUsedUsingDefault() throws
> Exception {
> Schema readerSchemaV1 = ENUM_AB_RECORD_DEFAULT_A;
> Schema writerSchemaV2 = ENUM_ABC_RECORD_DEFAULT_A;
> Record record = defaultRecordWithSchema(
> writerSchemaV2,
> FIELD_A,
> new EnumSymbol(writerSchemaV2, "C")
> );
> byte[] encoded = encodeGenericBlob(record);
> Record decodedRecord = decodeGenericBlob(
> readerSchemaV1,
> writerSchemaV2,
> encoded
> );
> Assert.assertEquals("A", decodedRecord.get(FIELD_A).toString());
> } {code}
>
> It should not fail but deserialize to "A".
--
This message was sent by Atlassian Jira
(v8.20.10#820010)