hadoop uses its own shaded avro 1.11 lib internally (hadoop-thirdparty/shaded-avro-1.11, I think it is stuck on the public classpath as some stuff bridges to avro (org.apache.hadoop.fs.AvroFSInput} . I'm tempted to tag that as Deprecated to move people off it so that there's no public avro dependencies there.
meaning: you won't break bits of the hadoop codebase if you upgrade. As for hadoop-2, time to purge it. everyone's life is better On Tue, 27 Aug 2024 at 10:38, Fokko Driesprong <fo...@apache.org> wrote: > Hey Claire, > > Thanks for raising this. > > 1.8.x -> 1.9.x is the most problematic upgrade because it breaks some > public APIs. We had Jackson objects in the public API, and those broke when > we switched from codehaus to fasterxml. Ideally, I would love to drop 1.8 > Avro support (May 2017), but if it is still widely used, then we can check > what it takes to bring back support. > > For the testing, I was hoping that we could leverage a profile, similar to > what we do with Hadoop 2 > < > https://github.com/apache/parquet-java/blob/312a15f53a011d1dc4863df196c0169bdf6db629/pom.xml#L638-L643 > > > . > > Both proposals are great, and happy to help! > > Kind regards, > Fokko > > > > Op di 27 aug 2024 om 10:04 schreef Gábor Szádovszky <ga...@apache.org>: > > > Hi Claire, > > > > Thanks for bringing this up. > > > > Since Avro has incompatibilities between these releases (which is natural > > since the second number of the Avro versions is considered to be the > major > > one), we only can state compatibility with one if we actually test with > it. > > So, I would vote on your second proposal or even both. > > > > Which Avro version do you think we shall support? (I think we need to > > support the latest one, and all the major ones below we think are > > required.) > > > > I am not sure if we need separate modules to be actually released for the > > different Avro versions or this is only required for testing. For the > first > > case, it'll be quite obvious which Avro version we support since it'll be > > part of the package naming. > > > > If you want to invest efforts in this, I am happy to help with reviewing. > > > > Cheers, > > Gabor > > > > Claire McGinty <claire.d.mcgi...@gmail.com> ezt írta (időpont: 2024. > aug. > > 26., H, 19:01): > > > > > Hi all, > > > > > > I wanted to start a thread discussing Avro cross-version support in > > > parquet-java. The parquet-avro module has been on Avro 1.11 since the > > 1.13 > > > release, but since then we've made fixes and added feature support for > > Avro > > > 1.8 APIs (ex1 <https://github.com/apache/parquet-java/pull/2957>, ex2 > > > <https://github.com/apache/parquet-java/pull/2993>). > > > > > > Mostly the Avro APIs referenced by parquet-avro are > > > cross-version-compatible, with a few exceptions: > > > > > > - Evolution of Schema constructor APIs > > > - New logical types (i.e., local timestamp and UUID) > > > - Renamed logical type conversion helpers > > > - Generated code for datetime types using Java Time vs Joda Time for > > > setters/getters > > > > > > Some of these are hard to catch when Parquet is compiled and tested > with > > > Avro 1.11 only. Additionally, as a user who mostly relies on Avro 1.8 > > > currently, I'm not sure how much longer Parquet will continue to > support > > > it. > > > > > > I have two proposals to build confidence and clarity around > parquet-avro: > > > > > > - Codifying in the parquet-avro documentation > > > < > > > > > > https://github.com/apache/parquet-java/blob/master/parquet-avro/README.md> > > > which Avro versions are officially supported and which are > > > deprecated/explicitly not supported > > > - Adding some kind of automated testing with all supported Avro > > > versions. This is a bit tricky because as I mentioned, the generated > > > SpecificRecord classes use incompatible logical type APIs across > Avro > > > versions, so we'd have to find a way to invoke avro-compiler/load > the > > > Avro > > > core library for different versions... this would probably require a > > > multi-module setup. > > > > > > I'd love to know what the Parquet community thinks about these ideas. > > > Additionally, I'm interested to learn more about what Avro versions > other > > > Parquet users rely on. Seems like there's a lot of variance across the > > data > > > ecosystem--Spark keeps up-to-date with latest Avro version, Hadoop has > > Avro > > > 1.9 pinned, and Apache Beam used to be tightly coupled with 1.8, but > has > > > recently refactored to be version-agnostic. > > > > > > Best, > > > Claire > > > > > >