[jira] [Comment Edited] (AVRO-2918) Schema polymorphism

Oscar Westra van Holthe - Kind (Jira) Thu, 21 Jul 2022 04:19:58 -0700


    [ 
https://issues.apache.org/jira/browse/AVRO-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569357#comment-17569357
 ]


Oscar Westra van Holthe - Kind edited comment on AVRO-2918 at 7/21/22 11:18 AM:
--------------------------------------------------------------------------------

For polymorphism, as I see it, there are basically two options:
 # Limit polymorphism to schema definitions. This means that for arrays, one 
must manually make a union for the elements.
Pro: encoding & decoding need not change
Con: creating these unions is cumbersome and error prone
 # Use a {_}discriminator{_}: a fields whose field value determines which 
subclass to use.
(the discriminator field must come before subschema fields, and all subschemas 
must be known; effectively sealing the the schema against later extensions)
Pro: no explicit union needed
Con: encoding & decoding changes 

 

The option of (silently) dropping fields that don't fit the parent type is not 
a valid option IMHO, as it changes the data.

 

I like the discriminator option, but it does involve quite a bit of work.

First of all, because the discriminator field decides which subschema to use, 
is that we must create a schema {*}set{*}: a schema with all subschemas in a 
single definition.

Next is that such a change in definition updates not just the spec, but also 
affects all schema parsing, and encoding/decoding values. This is quite a big 
change.

 

All in all, it may be more effective to create a union field containing one of 
several records, each containing the unique fields of a subschema.


was (Author: opwvhk):
For polymorphism, as I see it, there are basically two options:
 # Limit polymorphism to schema definitions. This means that for arrays, one 
must manually make a union for the elements.
Pro: encoding & decoding need not change
Con: creating these unions is cumbersome and error prone
 # Use a {_}discriminator{_}: a fields whose field value determines which 
subclass to use.
(the discriminator field must come before subschema fields, and all subschemas 
must be known; effectively sealing the the schema against later extensions)
Pro: no explicit union needed
Con: encoding & decoding changes 

 

The option of (silently) dropping fields that don't fit the parent type is not 
a valid option IMHO, as it changes the data.

> Schema polymorphism
> -------------------
>
>                 Key: AVRO-2918
>                 URL: https://issues.apache.org/jira/browse/AVRO-2918
>             Project: Apache Avro
>          Issue Type: New Feature
>          Components: logical types, misc, spec
>            Reporter: Jonathan Rapoport
>            Priority: Critical
>              Labels: features
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Include the option to use named types as base types for a new schema. Allow 
> for MRO generation. Field inheritance. 
> The benefits of this approach include:
>  * Defining a schema as validation for a certain wire, and so allowing the 
> receiver to be certain of the structure of the data (this works today). 
> However, defining an extension of this schema, or certain schemas which can 
> be normalized to the original schema, but contain additional information, 
> will not allow it to be sent over the same wire.
>  * Backwards compatibility through inheritance - you never break the old 
> schema, thus allowing a long integration period, with no need to recode all 
> processes familiar with the schema. The new schema will simply inherit the 
> old one, and only add information.
>  * Allow for full data control through polymorphism, and the ability to 
> replace structures within any supported language. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (AVRO-2918) Schema polymorphism

Reply via email to