Hi all, I wanted to start a thread discussing Avro cross-version support in parquet-java. The parquet-avro module has been on Avro 1.11 since the 1.13 release, but since then we've made fixes and added feature support for Avro 1.8 APIs (ex1 <https://github.com/apache/parquet-java/pull/2957>, ex2 <https://github.com/apache/parquet-java/pull/2993>).
Mostly the Avro APIs referenced by parquet-avro are cross-version-compatible, with a few exceptions: - Evolution of Schema constructor APIs - New logical types (i.e., local timestamp and UUID) - Renamed logical type conversion helpers - Generated code for datetime types using Java Time vs Joda Time for setters/getters Some of these are hard to catch when Parquet is compiled and tested with Avro 1.11 only. Additionally, as a user who mostly relies on Avro 1.8 currently, I'm not sure how much longer Parquet will continue to support it. I have two proposals to build confidence and clarity around parquet-avro: - Codifying in the parquet-avro documentation <https://github.com/apache/parquet-java/blob/master/parquet-avro/README.md> which Avro versions are officially supported and which are deprecated/explicitly not supported - Adding some kind of automated testing with all supported Avro versions. This is a bit tricky because as I mentioned, the generated SpecificRecord classes use incompatible logical type APIs across Avro versions, so we'd have to find a way to invoke avro-compiler/load the Avro core library for different versions... this would probably require a multi-module setup. I'd love to know what the Parquet community thinks about these ideas. Additionally, I'm interested to learn more about what Avro versions other Parquet users rely on. Seems like there's a lot of variance across the data ecosystem--Spark keeps up-to-date with latest Avro version, Hadoop has Avro 1.9 pinned, and Apache Beam used to be tightly coupled with 1.8, but has recently refactored to be version-agnostic. Best, Claire