The AVRO enum concept is highly impractical to use in real-life scenario (IMHO).
Default value or not, adding an enum value breaks backward/forward 
compatibility (and a default value is not going to cut it).

To keep the "enum" concept in my AVRO contracts, I choose to create record that 
holds a single integer value:
/** 'enum' definition. Supported values:
-1: unknown
1: debit
2: credit
*/
record PaymentTypeCD { int value; }

     /** usage in a AVRO payload */
     com.ingenico.shared.enums.PaymentTypeCD paymentType;

- it maintains some level of 'strong' typing for Java/C#
- with some basic code generation, developer don't see the difference between 
"real" enums and fake enums
- a producer can issue a new enum value without breaking the consumer ("can't 
deserialize 'my_new_value'...")
- a consumer doesn't have to loose the data (e.g. default value) even if the 
enum value is not processed (and the proposal forces unknown values to be lost)

Lowlight: enum values are no longer part of the contract
Highlight: full backward/forward compatibility with some minimal type 
information

I understand if might feel a bit too C/C++ but being able to read/write 
contracts without information loss was my main driver.

Regards

-----Original Message-----
From: Adam Bellemare (JIRA) [mailto:j...@apache.org]
Sent: samedi 13 janvier 2018 22:09
To: dev@avro.apache.org
Subject: [jira] [Comment Edited] (AVRO-1340) use default to allow old readers 
to specify default enum value when encountering new enum symbols


    [ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16325334#comment-16325334
 ]

Adam Bellemare edited comment on AVRO-1340 at 1/13/18 9:08 PM:
---------------------------------------------------------------

Avro enum "UNKNOWN" defaults has become extremely important to our company in 
the past while, especially as we're using Kafka and Avro integrations 
extensively. This ticket is very relevant to what we're doing. Here are my 
thoughts, let me know if I am missing something. I've been following this 
thread for a while and I'm hoping that I can help get it moving towards some 
form of resolution.



enum values have a specific meaning tied to them. Aliasing works well in the 
following conditions:

1) When the value added is entirely NEW to the data producer, and should 
therefore be aliased to UNKNOWN. If you alias it to an existing enum value you 
are redefining the data contract of that value. In this case a conversation 
should occur between the producer and the consumers of this data as it is now 
about renegotiating the data contract.

2) When the new enum values to be added added are entirely a COMPLETE SUBSET of 
an existing enum. For example, if the producer produces all 3xx 
HttpResponseCode as 300, splitting the enum value into 300, 301 and 302 and 
aliasing them all to 300 makes sense. It was always 300, and adding more 
granularity to the current schema is OK as it maps directly back to the single, 
original enum value.

The only real value I can see aliasing adding is for #2 above, as #1 is the 
same as having a default field for unknown values. #2 above is a scenario that 
I have not yet encountered, and I question how common it is. Without aliasing 
it would also be possible to work around that issue, simply by creating a new 
enum entry with the newly defined enum values and eventually phasing out the 
old one. Note that this would be highly specific to the scenario where you need 
to split an enum value into a complete subset of other values. Additions of 
enums can be done easily, as the UNKNOWN default value will simply be used by 
older reader schemas.

Redefining enum values via aliasing can be extremely dangerous. For instance, 
if HttpResponseCode = 300 was always ONLY just 300, then aliasing 301 and 302 
to it breaks the definition of the enum value and can have consequences for 
downstream consumers of this data. As it stands I have major concerns that 
adding aliasing to enum values will greatly weaken the "data contract" aspect 
of a given enum as it would normalize the redefinition of enum values in a way 
that is transparent to the consumers of the data.


was (Author: abellemare):
Avro enum "UNKNOWN" defaults has become extremely important to our company in 
the past while, especially as we're using Kafka and Avro integrations 
extensively. This ticket is very relevant to what we're doing. Here are my 
thoughts, let me know if I am missing something. I've been following this 
thread for a while and I'm hoping that I can help get it moving towards some 
form of resolution, otherwise we're going to have to fork our own Avro 
implementation.



enum values have a specific meaning tied to them. Aliasing works well in the 
following conditions:

1) When the value added is entirely NEW to the data producer, and should 
therefore be aliased to UNKNOWN. If you alias it to an existing enum value you 
are redefining the data contract of that value. In this case a conversation 
should occur between the producer and the consumers of this data as it is now 
about renegotiating the data contract.

2) When the new enum values to be added added are entirely a COMPLETE SUBSET of 
an existing enum. For example, if the producer produces all 3xx 
HttpResponseCode as 300, splitting the enum value into 300, 301 and 302 and 
aliasing them all to 300 makes sense. It was always 300, and adding more 
granularity to the current schema is OK as it maps directly back to the single, 
original enum value.

The only real value I can see aliasing adding is for #2 above, as #1 is the 
same as having a default field for unknown values. #2 above is a scenario that 
I have not yet encountered, and I question how common it is. Without aliasing 
it would also be possible to work around that issue, simply by creating a new 
enum entry with the newly defined enum values and eventually phasing out the 
old one. Note that this would be highly specific to the scenario where you need 
to split an enum value into a complete subset of other values. Additions of 
enums can be done easily, as the UNKNOWN default value will simply be used by 
older reader schemas.

Redefining enum values via aliasing can be extremely dangerous. For instance, 
if HttpResponseCode = 300 was always ONLY just 300, then aliasing 301 and 302 
to it breaks the definition of the enum value and can have consequences for 
downstream consumers of this data. As it stands I have major concerns that 
adding aliasing to enum values will greatly weaken the "data contract" aspect 
of a given enum as it would normalize the redefinition of enum values in a way 
that is transparent to the consumers of the data.

> use default to allow old readers to specify default enum value when 
> encountering new enum symbols
> -------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-1340
>                 URL: https://issues.apache.org/jira/browse/AVRO-1340
>             Project: Avro
>          Issue Type: Improvement
>          Components: spec
>         Environment: N/A
>            Reporter: Jim Donofrio
>            Priority: Minor
>
> The schema resolution page says:
> > if both are enums:
> > if the writer's symbol is not present in the reader's enum, then an
> error is signalled.
> This makes it difficult to use enum's because you can never add a enum value 
> and keep old reader's compatible. Why not use the default option to refer to 
> one of enum values so that when a old reader encounters a enum ordinal it 
> does not recognize, it can default to the optional schema provided one. If 
> the old schema does not provide a default then the older reader can continue 
> to fail as it does today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
This email and its content belong to Ingenico Group. The enclosed information 
is confidential and may not be disclosed to any unauthorized person. If you 
have received it by mistake do not forward it and delete it from your system. 
Cet email et son contenu sont la propriété du Groupe Ingenico. L’information 
qu’il contient est confidentielle et ne peut être communiquée à des personnes 
non autorisées. Si vous l’avez reçu par erreur ne le transférez pas et 
supprimez-le.

Reply via email to