[
https://issues.apache.org/jira/browse/AVRO-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569357#comment-17569357
]
Oscar Westra van Holthe - Kind edited comment on AVRO-2918 at 7/21/22 11:18 AM:
--------------------------------------------------------------------------------
For polymorphism, as I see it, there are basically two options:
# Limit polymorphism to schema definitions. This means that for arrays, one
must manually make a union for the elements.
Pro: encoding & decoding need not change
Con: creating these unions is cumbersome and error prone
# Use a {_}discriminator{_}: a fields whose field value determines which
subclass to use.
(the discriminator field must come before subschema fields, and all subschemas
must be known; effectively sealing the the schema against later extensions)
Pro: no explicit union needed
Con: encoding & decoding changes
The option of (silently) dropping fields that don't fit the parent type is not
a valid option IMHO, as it changes the data.
I like the discriminator option, but it does involve quite a bit of work.
First of all, because the discriminator field decides which subschema to use,
is that we must create a schema {*}set{*}: a schema with all subschemas in a
single definition.
Next is that such a change in definition updates not just the spec, but also
affects all schema parsing, and encoding/decoding values. This is quite a big
change.
All in all, it may be more effective to create a union field containing one of
several records, each containing the unique fields of a subschema.
was (Author: opwvhk):
For polymorphism, as I see it, there are basically two options:
# Limit polymorphism to schema definitions. This means that for arrays, one
must manually make a union for the elements.
Pro: encoding & decoding need not change
Con: creating these unions is cumbersome and error prone
# Use a {_}discriminator{_}: a fields whose field value determines which
subclass to use.
(the discriminator field must come before subschema fields, and all subschemas
must be known; effectively sealing the the schema against later extensions)
Pro: no explicit union needed
Con: encoding & decoding changes
The option of (silently) dropping fields that don't fit the parent type is not
a valid option IMHO, as it changes the data.
> Schema polymorphism
> -------------------
>
> Key: AVRO-2918
> URL: https://issues.apache.org/jira/browse/AVRO-2918
> Project: Apache Avro
> Issue Type: New Feature
> Components: logical types, misc, spec
> Reporter: Jonathan Rapoport
> Priority: Critical
> Labels: features
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> Include the option to use named types as base types for a new schema. Allow
> for MRO generation. Field inheritance.
> The benefits of this approach include:
> * Defining a schema as validation for a certain wire, and so allowing the
> receiver to be certain of the structure of the data (this works today).
> However, defining an extension of this schema, or certain schemas which can
> be normalized to the original schema, but contain additional information,
> will not allow it to be sent over the same wire.
> * Backwards compatibility through inheritance - you never break the old
> schema, thus allowing a long integration period, with no need to recode all
> processes familiar with the schema. The new schema will simply inherit the
> old one, and only add information.
> * Allow for full data control through polymorphism, and the ability to
> replace structures within any supported language.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)