Thanks for the idea. I'm gonna play around with that to see if it could work.
Niels On Tue, Jan 31, 2017 at 5:57 PM, Ryan Blue <rb...@netflix.com.invalid> wrote: > If you want to solve this problem by using a String to encode the value, > then you can do that by defining a logical type that is an enum-as-string. > But I'm not sure you want to do that. The nice thing about an enum is that > you use what you know about the schema ahead of time to get a much more > compact representation -- usually a byte rather than encoding the entire > string. So I'd much rather find a way of handling this case that keeps the > compact representation, while allowing for applications to gracefully > handling these. > > For generic, enum symbols are translated to GenericEnumSymbol, which can > hold any symbol. Adding an option to return the symbol from the writer's > schema even if it isn't in the reader's schema is one way around the > problem. That wouldn't work for reflect or specific, though. > > Another option that was suggested last year is to designate a catch-all > enum symbol. So your enum would be { 'A', 'B', 'UNKNOWN' } and { 'A', 'B', > 'C', 'UNKNOWN' }. When a v1 consumer reads v2 records, C gets turned into > UNKNOWN. > > I like the designated catch-all symbol because it is a reasonable way to > opt-in for forward-compatibility. > > rb > > On Tue, Jan 31, 2017 at 2:04 AM, Niels Basjes <ni...@basjes.nl> wrote: > > > Hi, > > > > I'm working on a project where we are putting message serialized avro > > records into Kafka. The schemas are made available via a schema registry > of > > some sorts. > > Because Kafka stores the messages for a longer period 'weeks' we have two > > common scenarios that occur when a new version of the schema is > introduced > > (i.e. from V1 to V2). > > > > 1) A V2 producer is released and a V1 consumer must be able to read the > > records. > > 2) A 'new' V2 consumer is released a few days after the V2 producer > started > > creating records. The V2 consumer starts reading Kafka "from the > beginning" > > and as a consequence first has to go through a set of V1 records. > > > > So in this usecase we need schema evolution in two directions. > > > > To make sure it all works as expected I did some experiments and found > that > > these requirements are all doable except when you are in need of an enum. > > > > This 'two directions' turns out to have a problem with changing the > values > > of an enum. > > > > You cannot write an enum { 'A', 'B', 'C' } and then read it with the > schema > > enum { 'A', 'B' } > > > > > > So I was thinking about a possible way to make this easier for the > > developer. > > > > The current idea that I want your opinion on: > > 1) In the IDL we add a way of directing that we want the enum to be > stored > > in a different way in the schema. I was thinking about something like > > either defining a new type like 'string enum' or perhaps use an > annotation > > of some sorts. > > 2) The 'string enum' is mapped into the actual schema as a string (which > > can contain ANY value). So anyone using the json schema can simply read > it > > because it is a string. > > 3) The generated code that is used to set/change the value enforces that > > only the allowed values can be set. > > > > This way a 'reader' can read any value, the schema is compatible in all > > directions. > > > > What do you guys think? > > Is this an idea worth trying out? > > > > -- > > Best regards / Met vriendelijke groeten, > > > > Niels Basjes > > > > > > -- > Ryan Blue > Software Engineer > Netflix > -- Best regards / Met vriendelijke groeten, Niels Basjes