Support Avro rolling version upgrades using schema manager

Nimrod Ofek Sat, 13 Apr 2024 07:54:07 -0700

Hi,

Currently, Avro records are supported in Spark - but with the limitation
that we must specify the input and output schema versions.
For writing out an avro record that is fine - but for reading avro records,
that is usually a problem since there are upgrades and changes - and the
current situation can't handle them.


Confluent Schema Registry provides such functionality by having the schema
id as part of the message - so we can fetch the relevant schema and read
through changes if there are - while still outputting the same output
schema we wish (as long as they are compatible of course).

ABRiS open source does supply that functionality, but I think that small
tweaks to the current Spark implementation should provide a good enough
solution for 99% of the cases - without the need to go to another project
that provides much of the functionality that Spark already has.

I did see an old ticket for that - that never truly matured:
https://issues.apache.org/jira/browse/SPARK-34652

I would like to do such a PR to have this functionality in Spark, just
wanted to make sure if there was any reason for not doing that - maybe
there was a specific reason for not supporting Schema registry?


Thanks!
Nimrod

Support Avro rolling version upgrades using schema manager

Reply via email to