[ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151515#comment-16151515
 ] 

Felix GV edited comment on AVRO-1340 at 9/2/17 2:52 PM:
--------------------------------------------------------

It seems to me like symbolAliases offer a lot of overlap with the functionality 
of the fallbackSymbol. If the desired long-term direction is to have 
symbolAliases (which I'm still not convinced is useful, but I wouldn't mind 
having them anyway) then it may be less confusing overall to have JUST 
symbolAliases and not the fallbackSymbol as well.

Otherwise, having both creates yet another set of edge cases stemming from the 
combination of the two concepts. While it is definitely possible to come up 
with appropriate policies for all such edge cases and make the implementation 
compliant with those policies, I am still wary of the cognitive burden that it 
will place on developers. A simple API is a very valuable asset.

For example: let's take the HTTP response code logging example from above, 
assuming [~zolyfarkas]'s V3 as the starting point.

V3:
"name":"httpResponseCode",
"symbols":["UNKNOWN", "200", "404", "500", "300"],
"fallbackSymbol": "UNKNOWN"

V4:
"name":"httpResponseCode",
"symbols":["UNKNOWN", "200", "404", "500", "300"],
"symbolAliases":{"300":["301", "302"]},
"fallbackSymbol": "UNKNOWN"

V5:
"name":"httpResponseCode",
"symbols":["UNKNOWN", "200", "404", "500", "300", "301", "302"],
"symbolAliases":{"300":["301", "302"]},
"fallbackSymbol": "UNKNOWN"

V6:
"name":"httpResponseCode",
"symbols":["UNKNOWN", "200", "404", "500", "300", "301", "302"],
"fallbackSymbol": "UNKNOWN"

Which of these schemas are supposed to be compatible with one another, and what 
is the outcome of sending 301 between the various combinations? Let's take a 
stab at it:

V5->V3: Do I use the writer's translation rule or the reader's? If I use the 
writer's, then my 301 will be read as 300, if I use the reader's, then it'll be 
read as UNKNOWN.

V5->V4: Same fallback and aliases on both the reader and writer. Which of the 
two rules take precedence over the other?

V5->V6: Both have 301 defined, so no special rules come into play.

V6->V3: 301 is definitely UNKNOWN.

V6->V4: Do I do the opposite as the V5->V3 translation? i.e.: writer's rule 
changes 301 to UNKNOWN, reader's rule changes 301 to 300. Or do we instead 
disregard reader and writer as the criterion, and rather give precedence to one 
type of rule over the other (aliases take precedence over fallback, no matter 
which side they're defined on, or vice versa). Moving on.

V6->V5: Both have 301 defined, so no special rules come into play.

--

3 of the above translations (V5->V3, V5->V4, V6->V4) have ambiguous behaviour. 
Do we decide some translation rules for each of them and mark them as 
compatible? Or do we mark them as incompatible? Either of these seem to leave a 
bitter taste.

I would much rather have just a single enum evolution mechanism, either 
fallback, or aliases, but not both.

In that regard, treating the two as separate issues may prevent us from making 
a holistic design choice.


was (Author: felixgv):
It seems to me like symbolAliases offer a lot of overlap with the functionality 
of the fallbackSymbol. If the desired long-term direction is to have 
symbolAliases (which I'm still not convinced is useful, but I wouldn't mind 
having them anyway) then it may be less confusing overall to have JUST 
symbolAliases and not the fallbackSymbol as well.

Otherwise, having both creates yet another set of edge cases stemming from the 
combination of the two concepts. While it is definitely possible to come up 
with appropriate policies for all such edge cases and make the implementation 
compliant with those policies, I am still wary of the cognitive burden that it 
will place on developers. A simple API is a very valuable asset.

For example: let's take the HTTP response code logging example from above, 
assuming [~zolyfarkas]'s V3 as the starting point.

V3:
"name":"httpResponseCode",
"symbols":["UNKNOWN", "200", "404", "500", "300"],
"fallbackSymbol": "UNKNOWN"

V4:
"name":"httpResponseCode",
"symbols":["UNKNOWN", "200", "404", "500", "300"],
"symbolAliases": {"300": ["301", "302"]},
"fallbackSymbol": "UNKNOWN"

V5:
"name":"httpResponseCode",
"symbols":["UNKNOWN", "200", "404", "500", "300", "301", "302"],
"symbolAliases": {"300": ["301", "302"]},
"fallbackSymbol": "UNKNOWN"

V6:
"name":"httpResponseCode",
"symbols":["UNKNOWN", "200", "404", "500", "300", "301", "302"],
"fallbackSymbol": "UNKNOWN"

Which of these schemas are supposed to be compatible with one another, and what 
is the outcome of sending 301 between the various combinations? Let's take a 
stab at it:

V5->V3: Do I use the writer's translation rule or the reader's? If I use the 
writer's, then my 301 will be read as 300, if I use the reader's, then it'll be 
read as UNKNOWN.

V5->V4: Same fallback and aliases on both the reader and writer. Which of the 
two rules take precedence over the other?

V5->V6: Both have 301 defined, so no special rules come into play.

V6->V3: 301 is definitely UNKNOWN.

V6->V4: Do I do the opposite as the V5->V3 translation? i.e.: writer's rule 
changes 301 to UNKNOWN, reader's rule changes 301 to 300. Or do we instead 
disregard reader and writer as the criterion, and rather give precedence to one 
type of rule over the other (aliases take precedence over fallback, no matter 
which side they're defined on, or vice versa). Moving on.

V6->V5: Both have 301 defined, so no special rules come into play.

--

3 of the above translations (V5->V3, V5->V4, V6->V4) have ambiguous behaviour. 
Do we decide some translation rules for each of them and mark them as 
compatible? Or do we mark them as incompatible? Either of these seem to leave a 
bitter taste.

I would much rather have just a single enum evolution mechanism, either 
fallback, or aliases, but not both.

In that regard, treating the two as separate issues may prevent us from making 
a holistic design choice.

> use default to allow old readers to specify default enum value when 
> encountering new enum symbols
> -------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-1340
>                 URL: https://issues.apache.org/jira/browse/AVRO-1340
>             Project: Avro
>          Issue Type: Improvement
>          Components: spec
>         Environment: N/A
>            Reporter: Jim Donofrio
>            Priority: Minor
>
> The schema resolution page says:
> > if both are enums:
> > if the writer's symbol is not present in the reader's enum, then an
> error is signalled.
> This makes it difficult to use enum's because you can never add a enum value 
> and keep old reader's compatible. Why not use the default option to refer to 
> one of enum values so that when a old reader encounters a enum ordinal it 
> does not recognize, it can default to the optional schema provided one. If 
> the old schema does not provide a default then the older reader can continue 
> to fail as it does today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to