Hi Frederic,

Please be aware that having a default value, and use it in case of non-parsable enum strings, would still solve your solution.

E.g:

First schema has enum { UNKNOWN, MALE, FEMALE } with default: UNKNOWN

Second schema defines additional mode, e.g.  { UNKNOWN, MALE, FEMALE, NON_BINARY }

Parsing data containing NON_BINARY with the first schema, the parser would mashal it into UNKNOWN (e.g. for conversion into a default: clause of a switch statement of the old code)

Other schema definition languages, e.g. ASN.1, has had this feature ("triple dots") for at least 20 years, so it is not really a new concept.

I'd vote for the simpler default option, as it provides much of the needed functionality (I agree that the current enum solution is impractical, if not impossible to use in practice).

You do have a point that with your "named-int" proposal, the old code would still be allowed to write the data without losing some of the information. But in practice, I would still prefer the default values.

$0.02

/Anders


On 2018-01-15 09:59, Frédéric SOUCHU wrote:
The AVRO enum concept is highly impractical to use in real-life scenario (IMHO).
Default value or not, adding an enum value breaks backward/forward 
compatibility (and a default value is not going to cut it).

To keep the "enum" concept in my AVRO contracts, I choose to create record that 
holds a single integer value:
/** 'enum' definition. Supported values:
-1: unknown
1: debit
2: credit
*/
record PaymentTypeCD { int value; }

      /** usage in a AVRO payload */
      com.ingenico.shared.enums.PaymentTypeCD paymentType;

- it maintains some level of 'strong' typing for Java/C#
- with some basic code generation, developer don't see the difference between 
"real" enums and fake enums
- a producer can issue a new enum value without breaking the consumer ("can't 
deserialize 'my_new_value'...")
- a consumer doesn't have to loose the data (e.g. default value) even if the 
enum value is not processed (and the proposal forces unknown values to be lost)

Lowlight: enum values are no longer part of the contract
Highlight: full backward/forward compatibility with some minimal type 
information

I understand if might feel a bit too C/C++ but being able to read/write 
contracts without information loss was my main driver.

Regards

-----Original Message-----
From: Adam Bellemare (JIRA) [mailto:j...@apache.org]
Sent: samedi 13 janvier 2018 22:09
To: dev@avro.apache.org
Subject: [jira] [Comment Edited] (AVRO-1340) use default to allow old readers 
to specify default enum value when encountering new enum symbols


     [ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16325334#comment-16325334
 ]

Adam Bellemare edited comment on AVRO-1340 at 1/13/18 9:08 PM:
---------------------------------------------------------------

Avro enum "UNKNOWN" defaults has become extremely important to our company in 
the past while, especially as we're using Kafka and Avro integrations extensively. This 
ticket is very relevant to what we're doing. Here are my thoughts, let me know if I am 
missing something. I've been following this thread for a while and I'm hoping that I can 
help get it moving towards some form of resolution.



enum values have a specific meaning tied to them. Aliasing works well in the 
following conditions:

1) When the value added is entirely NEW to the data producer, and should 
therefore be aliased to UNKNOWN. If you alias it to an existing enum value you 
are redefining the data contract of that value. In this case a conversation 
should occur between the producer and the consumers of this data as it is now 
about renegotiating the data contract.

2) When the new enum values to be added added are entirely a COMPLETE SUBSET of 
an existing enum. For example, if the producer produces all 3xx 
HttpResponseCode as 300, splitting the enum value into 300, 301 and 302 and 
aliasing them all to 300 makes sense. It was always 300, and adding more 
granularity to the current schema is OK as it maps directly back to the single, 
original enum value.

The only real value I can see aliasing adding is for #2 above, as #1 is the 
same as having a default field for unknown values. #2 above is a scenario that 
I have not yet encountered, and I question how common it is. Without aliasing 
it would also be possible to work around that issue, simply by creating a new 
enum entry with the newly defined enum values and eventually phasing out the 
old one. Note that this would be highly specific to the scenario where you need 
to split an enum value into a complete subset of other values. Additions of 
enums can be done easily, as the UNKNOWN default value will simply be used by 
older reader schemas.

Redefining enum values via aliasing can be extremely dangerous. For instance, if 
HttpResponseCode = 300 was always ONLY just 300, then aliasing 301 and 302 to it breaks 
the definition of the enum value and can have consequences for downstream consumers of 
this data. As it stands I have major concerns that adding aliasing to enum values will 
greatly weaken the "data contract" aspect of a given enum as it would normalize 
the redefinition of enum values in a way that is transparent to the consumers of the data.


was (Author: abellemare):
Avro enum "UNKNOWN" defaults has become extremely important to our company in 
the past while, especially as we're using Kafka and Avro integrations extensively. This 
ticket is very relevant to what we're doing. Here are my thoughts, let me know if I am 
missing something. I've been following this thread for a while and I'm hoping that I can 
help get it moving towards some form of resolution, otherwise we're going to have to fork 
our own Avro implementation.



enum values have a specific meaning tied to them. Aliasing works well in the 
following conditions:

1) When the value added is entirely NEW to the data producer, and should 
therefore be aliased to UNKNOWN. If you alias it to an existing enum value you 
are redefining the data contract of that value. In this case a conversation 
should occur between the producer and the consumers of this data as it is now 
about renegotiating the data contract.

2) When the new enum values to be added added are entirely a COMPLETE SUBSET of 
an existing enum. For example, if the producer produces all 3xx 
HttpResponseCode as 300, splitting the enum value into 300, 301 and 302 and 
aliasing them all to 300 makes sense. It was always 300, and adding more 
granularity to the current schema is OK as it maps directly back to the single, 
original enum value.

The only real value I can see aliasing adding is for #2 above, as #1 is the 
same as having a default field for unknown values. #2 above is a scenario that 
I have not yet encountered, and I question how common it is. Without aliasing 
it would also be possible to work around that issue, simply by creating a new 
enum entry with the newly defined enum values and eventually phasing out the 
old one. Note that this would be highly specific to the scenario where you need 
to split an enum value into a complete subset of other values. Additions of 
enums can be done easily, as the UNKNOWN default value will simply be used by 
older reader schemas.

Redefining enum values via aliasing can be extremely dangerous. For instance, if 
HttpResponseCode = 300 was always ONLY just 300, then aliasing 301 and 302 to it breaks 
the definition of the enum value and can have consequences for downstream consumers of 
this data. As it stands I have major concerns that adding aliasing to enum values will 
greatly weaken the "data contract" aspect of a given enum as it would normalize 
the redefinition of enum values in a way that is transparent to the consumers of the data.

use default to allow old readers to specify default enum value when 
encountering new enum symbols
-------------------------------------------------------------------------------------------------

                 Key: AVRO-1340
                 URL: https://issues.apache.org/jira/browse/AVRO-1340
             Project: Avro
          Issue Type: Improvement
          Components: spec
         Environment: N/A
            Reporter: Jim Donofrio
            Priority: Minor

The schema resolution page says:
if both are enums:
if the writer's symbol is not present in the reader's enum, then an
error is signalled.
This makes it difficult to use enum's because you can never add a enum value 
and keep old reader's compatible. Why not use the default option to refer to 
one of enum values so that when a old reader encounters a enum ordinal it does 
not recognize, it can default to the optional schema provided one. If the old 
schema does not provide a default then the older reader can continue to fail as 
it does today.


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
This email and its content belong to Ingenico Group. The enclosed information 
is confidential and may not be disclosed to any unauthorized person. If you 
have received it by mistake do not forward it and delete it from your system. 
Cet email et son contenu sont la propriété du Groupe Ingenico. L’information 
qu’il contient est confidentielle et ne peut être communiquée à des personnes 
non autorisées. Si vous l’avez reçu par erreur ne le transférez pas et 
supprimez-le.


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to