[ https://issues.apache.org/jira/browse/SPARK-27506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fokko Driesprong updated SPARK-27506: ------------------------------------- Fix Version/s: 3.0.0 > Function `from_avro` doesn't allow deserialization of data using other > compatible schemas > ----------------------------------------------------------------------------------------- > > Key: SPARK-27506 > URL: https://issues.apache.org/jira/browse/SPARK-27506 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Gianluca Amori > Assignee: Fokko Driesprong > Priority: Major > Fix For: 3.0.0 > > > SPARK-24768 and subtasks introduced support to read and write Avro data by > parsing a binary column of Avro format and converting it into its > corresponding catalyst value (and viceversa). > > The current implementation has the limitation of requiring deserialization of > an event with the exact same schema with which it was serialized. This breaks > one of the most important features of Avro, schema evolution > [https://docs.confluent.io/current/schema-registry/avro.html] - most > importantly, the ability to read old data with a newer (compatible) schema > without breaking the consumer. > > The GenericDatumReader in the Avro library already supports passing an > optional *writer's schema* (the schema with which the record was serialized) > alongside a mandatory *reader's schema* (the schema with which the record is > going to be deserialized). The proposed change is to do the same in the > from_avro function, allowing the possibility to pass an optional writer's > schema to be used in the deserialization. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org