Thanks for the feedback Doug! I think the change would still fit under the "formerly known as" umbrella. More specifically, the current implementation implements a "formerly read as" and the proposed suggestion would also allow the "formerly written as" counterpart.
Perhaps a new example would help make this explicit (I picked the first example to show that we could use this to solve the thread's initial union evolution issue but that also made the underlying logic less explicit). Assume we have the following records: // Old schema. record Event { long id; } // New schema. @aliases(["Event"]) record DetailedEvent { long id; string detail = ""; } Schema evolution between readers and writers can happen in two ways: Readers switch first to the new DetailedEvent. This is currently supported by the alias (the reader is aware that the DetailedEvent was "formerly read as" Event). Writers switch first, and readers now become unable to read the new data. There is no way currently for the writer to communicate the "formerly written as" relationship. Phrasing it differently, aliases are currently unidirectional and this change makes them bidirectional. I feel the added symmetry makes them slightly easier to understand as well: no need to remember which of the reader's or writer's schema's aliases are taken into account (with the caveat that if both are defined, some priority must be enforced). What do you think? I'm not very familiar with the Java implementation of schema resolution but from looking at the code it looks like it should be straightforward to make this work. For example we could allow the branch name to be passed internally to avoid duplicates. Or maybe something even simpler since it doesn't seem like such "resolving schemas" (created by the applyAliases method) can ever be used to write (would it even make sense to?) since they are created by ResolvingDecoders which don't expose them afterwards. Unless I missed something? -Matthieu > On Jun 14, 2016, at 10:40 AM, Doug Cutting <cutt...@gmail.com> wrote: > > Matthieu, > > Thanks for the example. > > First, is this really an alias, or is it something else? In other > words, would a reader ever map a written Vehicle to a Bus? If the use > cases are exclusive, perhaps we should call it something different > rather than overload the alias concept? > > Second, would the alias implementation, rewriting the writer's schema, > work here? It would result in a union with two, different, Vehicle > records. That could probably be made to work, but any other > references to the Vehicle schema might become ambiguous. I suspect > the implementation may end up being quite different. > > Aliases currently mean, "formerly known as", this feature seems more > like, "a kind of". > > Doug > > On Sat, Jun 11, 2016 at 7:43 PM, Matthieu Monsch <mon...@alum.mit.edu > <mailto:mon...@alum.mit.edu>> wrote: >> Happy to provide an example. Let’s assume that we have a Kafka producer >> emitting the following values: >> union { >> record Vehicle { >> int id; >> }, >> record Car { >> int id; >> boolean selfDriving; >> } >> } >> At a later point in time, a new vehicle becomes supported by the system and >> must be added to the schema: >> >> union { >> record Vehicle { >> long id; >> }, >> record Car { >> long id; >> boolean selfDriving; >> }, >> @aliases(["Vehicle"]) // Ignored when on the producer's schema. >> record Bus { >> long id; >> int capacity; >> } >> } >> We would like to be able to deploy the change to the producer without having >> to migrate all the consumers: existing consumers would treat each Bus as a >> Vehicle until they upgrade. >> >> However we can't do so under the current evolution rules since the alias is >> ignored (it would work if we added the alias to each consumer's schema but >> this isn't practical since it would also require a global migration). Note >> also that we can't preemptively add aliases on the consumers since the names >> of the records aren't known beforehand. >> >> Allowing the consumers (readers) to use the producer's (writer’s) aliases >> would fix this. If we make sure that writer aliases are used last (for >> example only falling back to them if neither the names nor the consumers' >> aliases match), this doesn't change any of the current allowed evolution >> rules and expands them to support additional cases (without introducing any >> new syntax). >> >> Does this make sense? >> >> -Matthieu >> >> Ps: In case it’s more readable, this example can also be read here: >> https://gist.github.com/mtth/527318445e5b52bfd491c0483ff5f9d3 >> <https://gist.github.com/mtth/527318445e5b52bfd491c0483ff5f9d3><https://gist.github.com/mtth/527318445e5b52bfd491c0483ff5f9d3 >> <https://gist.github.com/mtth/527318445e5b52bfd491c0483ff5f9d3>> . >> >> >> >>> On Jun 10, 2016, at 2:00 PM, Doug Cutting <cutt...@gmail.com> wrote: >>> >>> Matthieu, >>> >>> Can you please provide an example of how this would work? >>> >>> Thanks, >>> >>> Doug >>> >>> On Thu, Jun 9, 2016 at 6:47 PM, Matthieu Monsch <mon...@alum.mit.edu> wrote: >>> >>>> Thinking about this a bit more (and a couple months later…), maybe there >>>> is a simpler alternative. >>>> >>>> Currently, a reason why writer evolution is hard (the union issue >>>> described below is a special case of this) is that aliases are only used on >>>> the reader side. Why not also allow readers to use the writer’s aliases? >>>> >>>> Resolution would first be done on names, then fall back to reader aliases, >>>> and finally fall back to writer aliases. In the example below, it would be >>>> enough to add an alias to the base record inside any new records to have >>>> evolution work. >>>> >>>> -Matthieu >>>> >>>> >>>> >>>>> On Apr 22, 2016, at 8:42 AM, Matthieu Monsch <mon...@alum.mit.edu> >>>> wrote: >>>>> >>>>> The second solution sounds like a great alternative. >>>>> >>>>> Branch aliases are more straightforward than an implicit order-sensitive >>>> policy. They also have the additional benefit of giving users a bit more >>>> flexibility: since defaults are specified on the branches’ types, it is >>>> possible to have different branches have different defaults inside the same >>>> union. There are probably a few edge cases (e.g. allowing multiple such >>>> aliases would be useful) but they should be simple to address. >>>>> >>>>> What would be a good attribute name for this? `baseTypes`? >>>>> >>>>> -Matthieu >>>>> >>>>> >>>>> >>>>>> On Apr 21, 2016, at 10:52 AM, Doug Cutting <cutt...@gmail.com> wrote: >>>>>> >>>>>> On Wed, Apr 20, 2016 at 9:09 PM, Ryan Blue <rb...@netflix.com.invalid> >>>> wrote: >>>>>>> Making the default a property of an >>>>>>> inner schema makes me think that we will have to deal with multiple >>>> schemas >>>>>>> with such a label at some point. >>>>>> >>>>>> On Thu, Apr 21, 2016 at 6:54 AM, Matthieu Monsch <mon...@alum.mit.edu> >>>> wrote: >>>>>>> Delegating default selection to the branches themselves is a great >>>> idea but it >>>>>>> will be tricky to handle reference branches smoothly. More minor but >>>> it also >>>>>>> doesn’t feel intuitive to not have the union “own” its default >>>> attribute. >>>>>> >>>>>> If I understand your concerns correctly, I attempted to address this >>>> above: >>>>>> >>>>>> "Note however that, when using a record as the default branch, one >>>>>> could not then >>>>>> use that same record as a non-default branch in another union. To >>>>>> ameliorate that, we might permit multiple default branches in a union >>>>>> to be specified as default with the convention that the first such is >>>>>> used." >>>>>> >>>>>> Does that make sense? >>>>>> >>>>>> This isn't ideal syntax, but it's not terrible, and it doesn't change >>>>>> schema syntax incompatibly, which seems important, especially when its >>>>>> unlikely that all implementations would implement such a syntax change >>>>>> in a synchronized manner. >>>>>> >>>>>> Alternately, one might annotate each derived record with the name of >>>>>> its base record, then one wouldn't need to alter union definitions. >>>>>> This would work like an alias. If a record doesn't exist in the >>>>>> reader's schema, then an alias to the missing record would be added in >>>>>> the reader's schema to the base record it names in the writer's >>>>>> schema. Aliases work by rewriting the writer's schema at read-time, >>>>>> updating names, including those in unions. Might that work? It seems >>>>>> like perhaps a more elegant approach. It has compatible syntax and >>>>>> only alters behavior of a case that fails today. >>>>>> >>>>>> Doug