Re: Avro union compatibility mode enhancement proposal

Matthieu Monsch Sun, 17 Apr 2016 14:00:37 -0700

Great!

Summarizing:


+ For `enum`s, we can go with the approach described in AVRO-1340 [1]. The only 
missing step is to agree on the `.avsc` and `.avdl` syntax (can be discussed in 
that ticket).
+ For unions, we will add an optional catch-all attribute to mark a branch as 
resolution target when no names or aliases match (and come up with the 
corresponding syntax).

Does this sound like a good way forward?

[1] https://issues.apache.org/jira/browse/AVRO-1340



> On Apr 17, 2016, at 10:14 AM, Ryan Blue <[email protected]> wrote:
> 
> Thanks, Zoltan! Good to know there's an issue open for the enum enhancement
> and that you already suggested the fix we came up with in this thread. Lets
> use AVRO-1340 to track this.
> 
> rb
> 
> On Sun, Apr 17, 2016 at 4:49 AM, Zoltan Farkas <[email protected]
>> wrote:
> 
>> +1 , it would resolve AVRO-1340 as well!
>> 
>> --z
>> 
>> 
>> 
>>> On Apr 16, 2016, at 10:26 PM, Ryan Blue <[email protected]>
>> wrote:
>>> 
>>> +1
>>> 
>>> On Sat, Apr 16, 2016 at 5:54 PM, Matthieu Monsch <[email protected]>
>>> wrote:
>>> 
>>>>> I think it may make sense for
>>>>> enums to add a special value for it.
>>>> 
>>>> That would work great. To give slightly more flexibility to users, we
>>>> could even allow the reader’s schema to explicitly specify which symbol
>> to
>>>> use when it reads an unknown symbol. If not specified, resolution would
>>>> fail (consistent with the current behavior).
>>>> 
>>>> For example (assuming the corresponding key is called “onUnknown”):
>>>> 
>>>>> {
>>>>> “type”: “enum”,
>>>>> “name”: “Platform”,
>>>>> “symbols”: [“UNSUPPORTED”, “ANDROID”, “IOS”],
>>>>> “onUnknown”: “UNSUPPORTED"
>>>>> }
>>>> 
>>>> This `enum` would then be able to resolve schemas with extra symbols
>> (read
>>>> as `UNSUPPORTED`).
>>>> 
>>>> 
>>>> 
>>>>>> On Apr 16, 2016, at 1:36 PM, Ryan Blue <[email protected]>
>>>>> wrote:
>>>>> 
>>>>> Matthieu, how would that work with enums? I think it may make sense for
>>>>> enums to add a special value for it.
>>>>> 
>>>>> rb
>>>>> 
>>>>> On Sat, Apr 16, 2016 at 1:23 PM, Matthieu Monsch <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>>> I think that substituting different data for
>>>>>>> unrecognized branches in a union isn't the way to fix the problem,
>> but
>>>> I
>>>>>>> have a proposal for a way to fix it that looks more like you'd expect
>>>> in
>>>>>>> the OO example above: by adding the Vehicle class to your read
>> schema.
>>>>>>> 
>>>>>>> Right now, unions are first resolved by full class name as required
>> by
>>>>>> the
>>>>>>> spec. But after that, we have some additional rules to match schemas.
>>>>>> These
>>>>>>> rules are how we reconcile name differences from situations like
>>>> writing
>>>>>>> with a generic class and reading with a specific class. I'm proposing
>>>> you
>>>>>>> use a catch-all class (the superclass) with fields that are in all of
>>>> the
>>>>>>> union's branches, and we update schema resolution to allow it.
>>>>>> 
>>>>>> 
>>>>>> That sounds good. The only thing I’d add is to make this catch-all
>>>>>> behavior explicit in the schema (similar to how field defaults must be
>>>>>> explicitly added).
>>>>>> To help fix another common writer evolution issue, we could also add a
>>>>>> similar catch-all for `enum`s (optional, to be explicitly specified in
>>>> the
>>>>>> schema).
>>>>>> -Matthieu
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>>> On Apr 12, 2016, at 2:41 PM, Ryan Blue <[email protected]>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Yacine,
>>>>>>> 
>>>>>>> Thanks for the extra information. Sorry for my delay in replying, I
>>>>>> wanted
>>>>>>> to time think about your suggestion.
>>>>>>> 
>>>>>>> I see what you mean that you can think of a union as the superclass
>> of
>>>>>> its
>>>>>>> options. The reflect object model has test code that does just that,
>>>>>> where
>>>>>>> classes B and C inherit from A and the schema for A is created as a
>>>> union
>>>>>>> of B and C. But, I don't think that your suggestion aligns with the
>>>>>>> expectations of object oriented design. Maybe that's an easier way to
>>>>>>> present my concern:
>>>>>>> 
>>>>>>> Say I have a class, Vehicle, with subclasses Car and Truck. I have
>>>>>>> applications that work with my dataset, the vehicles that my company
>>>>>> owns,
>>>>>>> and we buy a bus. If I add a Bus class, what normally happens is that
>>>> an
>>>>>>> older application can work with it. A maintenance tracker would call
>>>>>>> getVehicles and can still get the lastMaintenanceDate for my Bus,
>> even
>>>>>>> though it doesn't know about busses. But what you suggest is that it
>> is
>>>>>>> replaced with a default, say null, in cases like this.
>>>>>>> 
>>>>>>> I think that the problem is that Avro no equivalent concept of
>>>>>> inheritance.
>>>>>>> There's only one way to represent it for what you need right now,
>> like
>>>>>>> Matthieu suggested. I think that substituting different data for
>>>>>>> unrecognized branches in a union isn't the way to fix the problem,
>> but
>>>> I
>>>>>>> have a proposal for a way to fix it that looks more like you'd expect
>>>> in
>>>>>>> the OO example above: by adding the Vehicle class to your read
>> schema.
>>>>>>> 
>>>>>>> Right now, unions are first resolved by full class name as required
>> by
>>>>>> the
>>>>>>> spec. But after that, we have some additional rules to match schemas.
>>>>>> These
>>>>>>> rules are how we reconcile name differences from situations like
>>>> writing
>>>>>>> with a generic class and reading with a specific class. I'm proposing
>>>> you
>>>>>>> use a catch-all class (the superclass) with fields that are in all of
>>>> the
>>>>>>> union's branches, and we update schema resolution to allow it. So
>> you'd
>>>>>>> have...
>>>>>>> 
>>>>>>> record Vehicle {
>>>>>>> long lastMaintenanceDate;
>>>>>>> }
>>>>>>> 
>>>>>>> record Car {
>>>>>>> long lastMaintenanceDate;
>>>>>>> int maxChildSeats;
>>>>>>> }
>>>>>>> 
>>>>>>> record Truck {
>>>>>>> long lastMainetanceDate;
>>>>>>> long lastWashedDate;
>>>>>>> }
>>>>>>> 
>>>>>>> Write schema: [null, Car, Truck] Read schema: [null, Car, Vehicle].
>> The
>>>>>>> catch-all class could provide the behavior you want, without silently
>>>>>>> giving you incorrect data.
>>>>>>> 
>>>>>>> Does that sound like a reasonable solution for your use case?
>>>>>>> 
>>>>>>> rb
>>>>>>> 
>>>>>>> On Tue, Mar 29, 2016 at 10:03 PM, Matthieu Monsch <
>> [email protected]
>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Yacine,
>>>>>>>> 
>>>>>>>> I believe Ryan was mentioning that if you start from a schema
>>>> `[“null”,
>>>>>>>> “Car”]` then rather than add a new bus branch to the union, you
>> could
>>>>>>>> update the car’s schema to include new bus fields. For example
>> (using
>>>>>> IDL
>>>>>>>> notation):
>>>>>>>> 
>>>>>>>>> // Initial approach
>>>>>>>>> // Evolved to union { null, Car, Bus }
>>>>>>>>> record Car {
>>>>>>>>> string vin;
>>>>>>>>> }
>>>>>>>>> record Bus {
>>>>>>>>> string vin;
>>>>>>>>> int capacity;
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> // Alternative approach
>>>>>>>>> // Here evolution wouldn't change the union { null, Vehicle }
>>>>>>>>> // Note also that this is actually a compatible evolution from the
>>>>>> first
>>>>>>>> approach, even under current rules (using aliases for name changes).
>>>>>>>>> record Vehicle {
>>>>>>>>> string vin; // (Would be optional if not all branches had it.)
>>>>>>>>> union { null, int } capacity; // The new field is added here.
>>>>>>>>> }
>>>>>>>> 
>>>>>>>> Pushing this further, you could also directly start upfront with
>> just
>>>> a
>>>>>>>> “Vehicle” record schema with an optional `car` field and add new
>>>>>> optional
>>>>>>>> fields as necessary (for example a `bus` field). This gives you much
>>>>>>>> greater flexibility for evolving readers and writers separately (but
>>>> you
>>>>>>>> lose the ability to enforce that at most one field is ever present
>> in
>>>>>> the
>>>>>>>> record).
>>>>>>>> 
>>>>>>>> I agree that Avro’s current evolution rules are more geared towards
>>>>>> reader
>>>>>>>> evolution: evolving writers while keeping readers constant is
>>>> difficult
>>>>>>>> (namely because they don't support adding branches to unions or
>>>> symbols
>>>>>> to
>>>>>>>> enums). However, adding a completely new and separate "compatibility
>>>>>> mode”
>>>>>>>> has a high complexity cost; both for implementors (the separate
>>>> classes
>>>>>> you
>>>>>>>> mention) and users (who must invoke them specifically). It would be
>>>>>> best to
>>>>>>>> keep evolution rules universal.
>>>>>>>> 
>>>>>>>> Maybe we could extend the logic behind field defaults? Enum schemas
>>>>>> could
>>>>>>>> simply contain a default symbol attribute. For unions, it’s a bit
>>>>>> trickier,
>>>>>>>> but we should be able to get around with a default branch attribute
>>>>>> which
>>>>>>>> would allow evolution if and only if the newly added branches can be
>>>>>> read
>>>>>>>> as the default branch. Assuming the attribute names don’t already
>>>> exist,
>>>>>>>> this would be compatible with the current behavior.
>>>>>>>> 
>>>>>>>> I think this use-case is common enough that it’d be worth exploring
>>>> (for
>>>>>>>> example, Rest.li [1] adds an `$UNKNOWN` symbol to somewhat address
>>>> this
>>>>>>>> issue; though not completely unfortunately [2]).
>>>>>>>> 
>>>>>>>> -Matthieu
>>>>>>>> 
>>>>>>>> [1] https://github.com/linkedin/rest.li
>>>>>>>> [2]
>>>> 
>> https://github.com/linkedin/rest.li/wiki/Snapshots-and-Resource-Compatibility-Checking#why-is-adding-to-an-enum-considered-backwards-incompatible
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>>> On Mar 29, 2016, at 2:40 AM, Yacine Benabderrahmane <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Ryan,
>>>>>>>>> 
>>>>>>>>> Just a little up^^. Could you please (or anyone else) give me a
>>>> little
>>>>>>>>> feedback ?
>>>>>>>>> Thanks in advance.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Yacine.
>>>>>>>>> 
>>>>>>>>> 2016-03-21 17:36 GMT+01:00 Yacine Benabderrahmane <
>>>>>>>> [email protected]
>>>>>>>>>> :
>>>>>>>>> 
>>>>>>>>>> Hi Ryan,
>>>>>>>>>> 
>>>>>>>>>> Thank you for giving feedback. I will try in the following to
>>>> provide
>>>>>>>> you
>>>>>>>>>> some more details about the addressed problem.
>>>>>>>>>> 
>>>>>>>>>> But before that, just a brief reminder of the context. Avro has
>> been
>>>>>>>>>> chosen in this project (and by many other ones for sure)
>> especially
>>>>>> for
>>>>>>>> a
>>>>>>>>>> very important feature: enabling forward and backward
>> compatibility
>>>>>>>>>> management through schema life-cycle. Our development model
>> involves
>>>>>>>>>> intensive usage of this feature, and many heavy developments are
>>>> made
>>>>>> in
>>>>>>>>>> parallel streams inside feature teams that share the same schema,
>>>>>>>> provided
>>>>>>>>>> the evolution of the latter complies with the stated compatibility
>>>>>>>> rules.
>>>>>>>>>> This implies that all the entities supported by the Avro schema
>> must
>>>>>>>>>> support the two-way compatibility, including unions. However, in
>> the
>>>>>>>>>> special case of the union, this two-way compatibility is not well
>>>>>>>> supported
>>>>>>>>>> by the current rules. Let me explain you the basement of our point
>>>> of
>>>>>>>> view,
>>>>>>>>>> it remains quite simple.
>>>>>>>>>> 
>>>>>>>>>> The use case is to have, for example,
>>>>>>>>>> - a first union version A:
>>>>>>>>>>  { "name": "Vehicle",
>>>>>>>>>>    "type": ["null", "Car"] }
>>>>>>>>>> - a new version of it B:
>>>>>>>>>>  { "name": "Vehicle",
>>>>>>>>>>    "type": ["null", "Car", "Bus"] }
>>>>>>>>>> For being forward compatible, an evolution of the union schema
>> must
>>>>>>>>>> guarantee that an old reader reading with A can read the data
>>>> written
>>>>>>>> with
>>>>>>>>>> the new schema B. Getting an error just means that the forward
>>>>>>>>>> compatibility feature is broken. But this is not actually the case
>>>>>> (and
>>>>>>>>>> this behavior is not suitable), because the old reader has a
>> correct
>>>>>>>> schema
>>>>>>>>>> and this schema has evolved naturally to version B to incorporate
>> a
>>>>>> new
>>>>>>>>>> Vehicle type. Not knowing this new type must not produce an error,
>>>> but
>>>>>>>> just
>>>>>>>>>> give the reader a default value, which means: "Either the data is
>>>> not
>>>>>>>> there
>>>>>>>>>> or you do not know how to handle it".
>>>>>>>>>> 
>>>>>>>>>> This is thought while keeping in mind that in an object-oriented
>>>> code
>>>>>>>>>> modeling, a union field is seen as a class member with the higher
>>>>>> level
>>>>>>>>>> generic type ("Any" (scala) or "Object" (java5+)...). Therefore,
>> it
>>>> is
>>>>>>>>>> natural for a modeler / programmer to implement the ability of not
>>>>>>>> getting
>>>>>>>>>> the awaited types and using some default value of known type. To
>>>> give
>>>>>> a
>>>>>>>>>> more complete specification, the new mode of compatibility has to
>>>>>> impose
>>>>>>>>>> one rule: the union default value must not change through versions
>>>> and
>>>>>>>> the
>>>>>>>>>> corresponding type must be placed at the top of the types list.
>> This
>>>>>> is
>>>>>>>>>> much easier to handle by development streams, because it is
>>>> addressed
>>>>>>>> once
>>>>>>>>>> for all in the very beginning of the schema life-cycle, than the
>>>> fact
>>>>>> to
>>>>>>>>>> oblige a number of teams, among which some are just not in place
>>>>>>>> anymore,
>>>>>>>>>> to update the whole code just because another dev team has
>> deployed
>>>> a
>>>>>>>> new
>>>>>>>>>> version of the union in the schema.
>>>>>>>>>> 
>>>>>>>>>> Now, for being backward compatible, the reader with B must always
>> be
>>>>>>>> able
>>>>>>>>>> to read data written with schema A. Even if the type included in
>> the
>>>>>>>> data
>>>>>>>>>> is not known, so it gets the default value and not an error.
>>>>>>>>>> 
>>>>>>>>>> I understand that getting an error could make sense when the
>>>> requested
>>>>>>>>>> field is not present. However, this behavior:
>>>>>>>>>> 
>>>>>>>>>> - is very restrictive, meaning: this obliges the old reader to
>>>> update
>>>>>>>>>> its code for integrating the new schema, while he is not managing
>> to
>>>>>>>> do it
>>>>>>>>>> for many reasons: development stream of next delivery is not
>>>>>>>> finished, or
>>>>>>>>>> not engaged, or not even planned - in the case of old and stable
>>>> code
>>>>>>>>>> - breaks the forward compatibility feature: the older reader is
>> not
>>>>>>>>>> able to read the new version of the union without getting an error
>>>>>>>>>> - breaks the backward compatibility feature: the new reader is not
>>>>>>>>>> able to read an old version containing unknown types of the union
>>>>>>>> without
>>>>>>>>>> getting an error
>>>>>>>>>> 
>>>>>>>>>> By the way, what do you exactly mean by "pushing evolution lower"
>>>> and
>>>>>>>>>> "update the record"? Could you please give me an example of the
>>>> trick
>>>>>>>> you
>>>>>>>>>> are talking about?
>>>>>>>>>> 
>>>>>>>>>> Just to be a bit more precise, we are not targeting to use a
>>>> "trick".
>>>>>>>> This
>>>>>>>>>> life-cycle management should be included in a standard so to keep
>>>> the
>>>>>>>>>> software development clean, production safe and compliant with a
>>>>>> complex
>>>>>>>>>> product road-map.
>>>>>>>>>> 
>>>>>>>>>> Finally, you seem to be concerned by the "significant change to
>> the
>>>>>>>>>> current evolution rules". Well, we actually do not change these
>>>> rules,
>>>>>>>> they
>>>>>>>>>> keep just the same. All we are proposing is to introduce a *mode*
>>>>>> where
>>>>>>>>>> the rules of union compatibility change. This mode is materialized
>>>> by
>>>>>> a
>>>>>>>>>> minimum and thin impact of the existing classes without any change
>>>> in
>>>>>>>> the
>>>>>>>>>> behavior, all the logic of the new compatibility mode is
>> implemented
>>>>>> by
>>>>>>>> new
>>>>>>>>>> classes that must be invoked specifically. But you would better
>> see
>>>> it
>>>>>>>> in
>>>>>>>>>> the code patch.
>>>>>>>>>> 
>>>>>>>>>> Looking forward to reading your feedback and answers.
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Yacine.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 2016-03-17 19:00 GMT+01:00 Ryan Blue <[email protected]>:
>>>>>>>>>> 
>>>>>>>>>>> Hi Yacine,
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for the proposal. It sounds interesting, but I want to
>> make
>>>>>> sure
>>>>>>>>>>> there's a clear use case for this because it's a significant
>> change
>>>>>> to
>>>>>>>> the
>>>>>>>>>>> current evolution rules. Right now we guarantee that a reader
>> will
>>>>>> get
>>>>>>>> an
>>>>>>>>>>> error if the data has an unknown union branch rather than
>> getting a
>>>>>>>>>>> default
>>>>>>>>>>> value. I think that makes sense: if the reader is requesting a
>>>> field,
>>>>>>>> it
>>>>>>>>>>> should get the actual datum for it rather than a default because
>> it
>>>>>>>>>>> doesn't
>>>>>>>>>>> know how to handle it.
>>>>>>>>>>> 
>>>>>>>>>>> Could you give us an example use case that requires this new
>> logic?
>>>>>>>>>>> 
>>>>>>>>>>> I just want to make sure we can't solve your problem another way.
>>>> For
>>>>>>>>>>> example, pushing evolution lower in the schema usually does the
>>>>>> trick:
>>>>>>>>>>> rather than having ["null", "RecordV1"] => ["null", "RecordV1",
>>>>>>>>>>> "RecordV2"], it is usually better to update the record so that
>>>> older
>>>>>>>>>>> readers can ignore the new fields.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>>>>>>>> rb
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Mar 14, 2016 at 7:30 AM, Yacine Benabderrahmane <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> 
>>>>>>>>>>>> In order to provide a solution to the union schema evolution
>>>>>> problem,
>>>>>>>>>>> as it
>>>>>>>>>>>> was earlier notified in the thread "add a type to a union
>>>>>>>>>>>> <
>>>> 
>> http://search-hadoop.com/m/F2svI1IXrQS1bIFgU1/union+evolution&subj=add+a+type+to+a+union
>>>>>>>>>>>>> "
>>>>>>>>>>>> of the user mailing list, we decided, for the needs of the
>>>> reactive
>>>>>>>>>>>> architecture we have implemented for one of our clients, to
>>>>>> implement
>>>>>>>> an
>>>>>>>>>>>> evolution of the compatibility principle of Avro when using
>>>> Unions.
>>>>>>>> For
>>>>>>>>>>>> reminder, the asked question was about the way to handle the
>> case
>>>>>>>> where
>>>>>>>>>>> a
>>>>>>>>>>>> reader, using an old version of a schema that includes a union,
>>>>>> reads
>>>>>>>>>>> some
>>>>>>>>>>>> data written with a new version of the schema where a type has
>>>> been
>>>>>>>>>>> added
>>>>>>>>>>>> to the union.
>>>>>>>>>>>> 
>>>>>>>>>>>> As answered by Martin Kleppman in that thread, one way to handle
>>>>>> this
>>>>>>>>>>> kind
>>>>>>>>>>>> of evolution (a new version of the schema adds a new type type
>> in
>>>> a
>>>>>>>>>>> union)
>>>>>>>>>>>> would be to ensure that all the development streams have
>>>> integrated
>>>>>>>> the
>>>>>>>>>>> new
>>>>>>>>>>>> schema B before deploying it in the IT schema referential.
>>>>>>>>>>>> However, in big structures involving strongly uncorrelated teams
>>>> (in
>>>>>>>> the
>>>>>>>>>>>> product life-cycle point of view), this approach appears to be
>>>> quite
>>>>>>>>>>>> impracticable, causing production stream congestion, blocking
>>>>>> behavior
>>>>>>>>>>>> between teams, and a bunch of other
>>>>>>>>>>>> unwanted-counter-agile-/-reactive-phenomena...
>>>>>>>>>>>> 
>>>>>>>>>>>> Therefore, we had to implement a new *compatibility* *mode* for
>>>> the
>>>>>>>>>>> unions,
>>>>>>>>>>>> while taking care to comply with the following rules:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. Clear rules of compatibility are stated and integrated for
>> this
>>>>>>>>>>>> compatibility mode
>>>>>>>>>>>> 2. The standard Avro behavior must be kept intact
>>>>>>>>>>>> 3. All the evolution implementation must be done without
>>>>>> introducing
>>>>>>>>>>> any
>>>>>>>>>>>> regression (all existing tests of the Avro stack must succeed)
>>>>>>>>>>>> 4. The code impact on Avro stack must be minimized
>>>>>>>>>>>> 
>>>>>>>>>>>> Just to give you a very brief overview (as I don't know if this
>> is
>>>>>>>>>>> actually
>>>>>>>>>>>> the place for a full detailed description), the evolution
>>>> addresses
>>>>>>>> the
>>>>>>>>>>>> typical problem where two development streams use the same
>> schema
>>>>>> but
>>>>>>>> in
>>>>>>>>>>>> different versions, in the case described shortly as follows:
>>>>>>>>>>>> 
>>>>>>>>>>>> - The first development stream, called "DevA", uses the version
>> A
>>>>>> of
>>>>>>>>>>> a
>>>>>>>>>>>> schema which integrates a union referencing two types, say
>> "null"
>>>>>>>> and
>>>>>>>>>>>> "string". The default value is set to null.
>>>>>>>>>>>> - The second development team, called "DevB", uses the version
>> B,
>>>>>>>>>>> which
>>>>>>>>>>>> is an evolution of the version A, as it adds a reference to a
>> new
>>>>>>>>>>> type
>>>>>>>>>>>> in
>>>>>>>>>>>> the former union, say "long" (which makes it "null", string" and
>>>>>>>>>>> "long")
>>>>>>>>>>>> - When the schema B is deployed on the schema referential (in
>> our
>>>>>>>>>>> case,
>>>>>>>>>>>> the IO Confluent Schema Registry) subsequently to the version A
>>>>>>>>>>>>  - The stream "DevA" must be able to read with schema A, even
>> if
>>>>>>>>>>> the
>>>>>>>>>>>>  data has been written using the schema B with the type "long"
>>>> in
>>>>>>>>>>>> the union.
>>>>>>>>>>>>  In the latter case, the read value is the union default value
>>>>>>>>>>>>  - The stream "DevB" must be able to read/write with schema B,
>>>>>>>>>>> even if
>>>>>>>>>>>>  it writes the data using the type "long" in the union
>>>>>>>>>>>> 
>>>>>>>>>>>> The evolution that we implemented for this mode includes some
>>>> rules
>>>>>>>> that
>>>>>>>>>>>> are based on the principles stated in the Avro documentation. It
>>>> is
>>>>>>>> even
>>>>>>>>>>>> more powerful than showed in the few lines above, as it enables
>>>> the
>>>>>>>>>>> readers
>>>>>>>>>>>> to get the default value of the union if the schema used for
>>>> reading
>>>>>>>>>>> does
>>>>>>>>>>>> not contain the type used by the writer in the union. This
>>>> achieves
>>>>>> a
>>>>>>>>>>> new
>>>>>>>>>>>> mode of forward / backward compatibility. This evolution is for
>>>> now
>>>>>>>>>>> working
>>>>>>>>>>>> perfectly, and should be on production in the few coming weeks.
>> We
>>>>>>>> have
>>>>>>>>>>>> also made an evolution of the IO Confluent Schema Registry stack
>>>> to
>>>>>>>>>>> support
>>>>>>>>>>>> it, again in a transparent manner (we also intend to contribute
>> to
>>>>>>>> this
>>>>>>>>>>>> stack in a second / parallel step).
>>>>>>>>>>>> 
>>>>>>>>>>>> In the objective of contributing to the Avro stack with this new
>>>>>>>>>>>> compatibility mode for unions, I have some questions about the
>>>>>>>>>>> procedure:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. How can I achieve the contribution proposal? Should I
>> directly
>>>>>>>>>>>> provide a patch in JIRA and dive into the details right there?
>>>>>>>>>>>> 2. The base version of this evolution is 1.7.7, is it eligible
>> to
>>>>>>>>>>>> contribution evaluation anyway?
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks in advance, looking forward to hearing from you and
>> giving
>>>>>> you
>>>>>>>>>>> more
>>>>>>>>>>>> details.
>>>>>>>>>>>> 
>>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>> --
>>>>>>>>>>>> *Yacine Benabderrahmane*
>>>>>>>>>>>> Architect
>>>>>>>>>>>> *OCTO Technology*
>>>>>>>>>>>> <http://www.octo.com>
>>>>>>>>>>>> -----------------------------------------------
>>>>>>>>>>>> Tel : +33 6 10 88 25 98
>>>>>>>>>>>> 50 avenue des Champs Elysées
>>>>>>>>>>>> 75008 PARIS
>>>>>>>>>>>> www.octo.com
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Ryan Blue
>>>>>>>>>>> Software Engineer
>>>>>>>>>>> Netflix
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Ryan Blue
>>>>>>> Software Engineer
>>>>>>> Netflix
>>>>> 
>>>>> 
>>>>> --
>>>>> Ryan Blue
>>>>> Software Engineer
>>>>> Netflix
>>> 
>>> 
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>> 
>> 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix

Re: Avro union compatibility mode enhancement proposal

Reply via email to