Re: Avro union compatibility mode enhancement proposal

Matthieu Monsch Sat, 11 Jun 2016 19:49:23 -0700

Just realized I forgot to change the `id` fields to long in the first union 
(all IDs should be longs). Apologies for the confusion, they don’t matter at 
all in the example.




> On Jun 11, 2016, at 7:43 PM, Matthieu Monsch <mon...@alum.mit.edu> wrote:
> 
> Happy to provide an example. Let’s assume that we have a Kafka producer 
> emitting the following values:
> union {
>   record Vehicle {
>     int id;
>   },
>   record Car {
>     int id;
>     boolean selfDriving;
>   }
> }
> At a later point in time, a new vehicle becomes supported by the system and 
> must be added to the schema:
> 
> union {
>   record Vehicle {
>     long id;
>   },
>   record Car {
>     long id;
>     boolean selfDriving;
>   },
>   @aliases(["Vehicle"]) // Ignored when on the producer's schema.
>   record Bus {
>     long id;
>     int capacity;
>   }
> }
> We would like to be able to deploy the change to the producer without having 
> to migrate all the consumers: existing consumers would treat each Bus as a 
> Vehicle until they upgrade.
> 
> However we can't do so under the current evolution rules since the alias is 
> ignored (it would work if we added the alias to each consumer's schema but 
> this isn't practical since it would also require a global migration). Note 
> also that we can't preemptively add aliases on the consumers since the names 
> of the records aren't known beforehand.
> 
> Allowing the consumers (readers) to use the producer's (writer’s) aliases 
> would fix this. If we make sure that writer aliases are used last (for 
> example only falling back to them if neither the names nor the consumers' 
> aliases match), this doesn't change any of the current allowed evolution 
> rules and expands them to support additional cases (without introducing any 
> new syntax).
> 
> Does this make sense?
> 
> -Matthieu
> 
> Ps: In case it’s more readable, this example can also be read here: 
> https://gist.github.com/mtth/527318445e5b52bfd491c0483ff5f9d3 
> <https://gist.github.com/mtth/527318445e5b52bfd491c0483ff5f9d3> .
> 
> 
> 
>> On Jun 10, 2016, at 2:00 PM, Doug Cutting <cutt...@gmail.com 
>> <mailto:cutt...@gmail.com>> wrote:
>> 
>> Matthieu,
>> 
>> Can you please provide an example of how this would work?
>> 
>> Thanks,
>> 
>> Doug
>> 
>> On Thu, Jun 9, 2016 at 6:47 PM, Matthieu Monsch <mon...@alum.mit.edu 
>> <mailto:mon...@alum.mit.edu>> wrote:
>> 
>>> Thinking about this a bit more (and a couple months later…), maybe there
>>> is a simpler alternative.
>>> 
>>> Currently, a reason why writer evolution is hard (the union issue
>>> described below is a special case of this) is that aliases are only used on
>>> the reader side. Why not also allow readers to use the writer’s aliases?
>>> 
>>> Resolution would first be done on names, then fall back to reader aliases,
>>> and finally fall back to writer aliases. In the example below, it would be
>>> enough to add an alias to the base record inside any new records to have
>>> evolution work.
>>> 
>>> -Matthieu
>>> 
>>> 
>>> 
>>>> On Apr 22, 2016, at 8:42 AM, Matthieu Monsch <mon...@alum.mit.edu 
>>>> <mailto:mon...@alum.mit.edu>>
>>> wrote:
>>>> 
>>>> The second solution sounds like a great alternative.
>>>> 
>>>> Branch aliases are more straightforward than an implicit order-sensitive
>>> policy. They also have the additional benefit of giving users a bit more
>>> flexibility: since defaults are specified on the branches’ types, it is
>>> possible to have different branches have different defaults inside the same
>>> union. There are probably a few edge cases (e.g. allowing multiple such
>>> aliases would be useful) but they should be simple to address.
>>>> 
>>>> What would be a good attribute name for this? `baseTypes`?
>>>> 
>>>> -Matthieu
>>>> 
>>>> 
>>>> 
>>>>> On Apr 21, 2016, at 10:52 AM, Doug Cutting <cutt...@gmail.com 
>>>>> <mailto:cutt...@gmail.com>> wrote:
>>>>> 
>>>>> On Wed, Apr 20, 2016 at 9:09 PM, Ryan Blue <rb...@netflix.com.invalid 
>>>>> <mailto:rb...@netflix.com.invalid>>
>>> wrote:
>>>>>> Making the default a property of an
>>>>>> inner schema makes me think that we will have to deal with multiple
>>> schemas
>>>>>> with such a label at some point.
>>>>> 
>>>>> On Thu, Apr 21, 2016 at 6:54 AM, Matthieu Monsch <mon...@alum.mit.edu 
>>>>> <mailto:mon...@alum.mit.edu>>
>>> wrote:
>>>>>> Delegating default selection to the branches themselves is a great
>>> idea but it
>>>>>> will be tricky to handle reference branches smoothly. More minor but
>>> it also
>>>>>> doesn’t feel intuitive to not have the union “own” its default
>>> attribute.
>>>>> 
>>>>> If I understand your concerns correctly, I attempted to address this
>>> above:
>>>>> 
>>>>> "Note however that, when using a record as the default branch, one
>>>>> could not then
>>>>> use that same record as a non-default branch in another union.  To
>>>>> ameliorate that, we might permit multiple default branches in a union
>>>>> to be specified as default with the convention that the first such is
>>>>> used."
>>>>> 
>>>>> Does that make sense?
>>>>> 
>>>>> This isn't ideal syntax, but it's not terrible, and it doesn't change
>>>>> schema syntax incompatibly, which seems important, especially when its
>>>>> unlikely that all implementations would implement such a syntax change
>>>>> in a synchronized manner.
>>>>> 
>>>>> Alternately, one might annotate each derived record with the name of
>>>>> its base record, then one wouldn't need to alter union definitions.
>>>>> This would work like an alias.  If a record doesn't exist in the
>>>>> reader's schema, then an alias to the missing record would be added in
>>>>> the reader's schema to the base record it names in the writer's
>>>>> schema.  Aliases work by rewriting the writer's schema at read-time,
>>>>> updating names, including those in unions.  Might that work?  It seems
>>>>> like perhaps a more elegant approach.  It has compatible syntax and
>>>>> only alters behavior of a case that fails today.
>>>>> 
>>>>> Doug
>>>> 
>>> 
>>> 
>

Re: Avro union compatibility mode enhancement proposal

Reply via email to