hadoop uses its own shaded avro 1.11 lib internally
(hadoop-thirdparty/shaded-avro-1.11, I think it is stuck on the public
classpath as some stuff bridges to avro (org.apache.hadoop.fs.AvroFSInput} .
I'm tempted to tag that as Deprecated to move people off it so that there's
no public avro dependencies there.

meaning: you won't break bits of the hadoop codebase if you upgrade.

As for hadoop-2, time to purge it. everyone's life is better

On Tue, 27 Aug 2024 at 10:38, Fokko Driesprong <fo...@apache.org> wrote:

> Hey Claire,
>
> Thanks for raising this.
>
> 1.8.x -> 1.9.x is the most problematic upgrade because it breaks some
> public APIs. We had Jackson objects in the public API, and those broke when
> we switched from codehaus to fasterxml. Ideally, I would love to drop 1.8
> Avro support (May 2017), but if it is still widely used, then we can check
> what it takes to bring back support.
>
> For the testing, I was hoping that we could leverage a profile, similar to
> what we do with Hadoop 2
> <
> https://github.com/apache/parquet-java/blob/312a15f53a011d1dc4863df196c0169bdf6db629/pom.xml#L638-L643
> >
> .
>
> Both proposals are great, and happy to help!
>
> Kind regards,
> Fokko
>
>
>
> Op di 27 aug 2024 om 10:04 schreef Gábor Szádovszky <ga...@apache.org>:
>
> > Hi Claire,
> >
> > Thanks for bringing this up.
> >
> > Since Avro has incompatibilities between these releases (which is natural
> > since the second number of the Avro versions is considered to be the
> major
> > one), we only can state compatibility with one if we actually test with
> it.
> > So, I would vote on your second proposal or even both.
> >
> > Which Avro version do you think we shall support? (I think we need to
> > support the latest one, and all the major ones below we think are
> > required.)
> >
> > I am not sure if we need separate modules to be actually released for the
> > different Avro versions or this is only required for testing. For the
> first
> > case, it'll be quite obvious which Avro version we support since it'll be
> > part of the package naming.
> >
> > If you want to invest efforts in this, I am happy to help with reviewing.
> >
> > Cheers,
> > Gabor
> >
> > Claire McGinty <claire.d.mcgi...@gmail.com> ezt írta (időpont: 2024.
> aug.
> > 26., H, 19:01):
> >
> > > Hi all,
> > >
> > > I wanted to start a thread discussing Avro cross-version support in
> > > parquet-java. The parquet-avro module has been on Avro 1.11 since the
> > 1.13
> > > release, but since then we've made fixes and added feature support for
> > Avro
> > > 1.8 APIs (ex1 <https://github.com/apache/parquet-java/pull/2957>, ex2
> > > <https://github.com/apache/parquet-java/pull/2993>).
> > >
> > > Mostly the Avro APIs referenced by parquet-avro are
> > > cross-version-compatible, with a few exceptions:
> > >
> > >    - Evolution of Schema constructor APIs
> > >    - New logical types (i.e., local timestamp and UUID)
> > >    - Renamed logical type conversion helpers
> > >    - Generated code for datetime types using Java Time vs Joda Time for
> > >    setters/getters
> > >
> > > Some of these are hard to catch when Parquet is compiled and tested
> with
> > > Avro 1.11 only. Additionally, as a user who mostly relies on Avro 1.8
> > > currently, I'm not sure how much longer Parquet will continue to
> support
> > > it.
> > >
> > > I have two proposals to build confidence and clarity around
> parquet-avro:
> > >
> > >    - Codifying in the parquet-avro documentation
> > >    <
> > >
> >
> https://github.com/apache/parquet-java/blob/master/parquet-avro/README.md>
> > >    which Avro versions are officially supported and which are
> > >    deprecated/explicitly not supported
> > >    - Adding some kind of automated testing with all supported Avro
> > >    versions. This is a bit tricky because as I mentioned, the generated
> > >    SpecificRecord classes use incompatible logical type APIs across
> Avro
> > >    versions, so we'd have to find a way to invoke avro-compiler/load
> the
> > > Avro
> > >    core library for different versions... this would probably require a
> > >    multi-module setup.
> > >
> > > I'd love to know what the Parquet community thinks about these ideas.
> > > Additionally, I'm interested to learn more about what Avro versions
> other
> > > Parquet users rely on. Seems like there's a lot of variance across the
> > data
> > > ecosystem--Spark keeps up-to-date with latest Avro version, Hadoop has
> > Avro
> > > 1.9 pinned, and Apache Beam used to be tightly coupled with 1.8, but
> has
> > > recently refactored to be version-agnostic.
> > >
> > > Best,
> > > Claire
> > >
> >
>

Reply via email to