Great, thanks for the responses! Agree that the workarounds for 1.8 are
painful to manage - we still rely on it, but I'm not sure how many other
Parquet users do.

I'll look into the profiles idea as used for Hadoop2 checks, thanks Fokko
-- will aim for a PR in the next few weeks.

Best,
Claire

On Tue, Aug 27, 2024 at 2:16 PM Steve Loughran <[email protected]>
wrote:

> hadoop uses its own shaded avro 1.11 lib internally
> (hadoop-thirdparty/shaded-avro-1.11, I think it is stuck on the public
> classpath as some stuff bridges to avro (org.apache.hadoop.fs.AvroFSInput}
> .
> I'm tempted to tag that as Deprecated to move people off it so that there's
> no public avro dependencies there.
>
> meaning: you won't break bits of the hadoop codebase if you upgrade.
>
> As for hadoop-2, time to purge it. everyone's life is better
>
> On Tue, 27 Aug 2024 at 10:38, Fokko Driesprong <[email protected]> wrote:
>
> > Hey Claire,
> >
> > Thanks for raising this.
> >
> > 1.8.x -> 1.9.x is the most problematic upgrade because it breaks some
> > public APIs. We had Jackson objects in the public API, and those broke
> when
> > we switched from codehaus to fasterxml. Ideally, I would love to drop 1.8
> > Avro support (May 2017), but if it is still widely used, then we can
> check
> > what it takes to bring back support.
> >
> > For the testing, I was hoping that we could leverage a profile, similar
> to
> > what we do with Hadoop 2
> > <
> >
> https://github.com/apache/parquet-java/blob/312a15f53a011d1dc4863df196c0169bdf6db629/pom.xml#L638-L643
> > >
> > .
> >
> > Both proposals are great, and happy to help!
> >
> > Kind regards,
> > Fokko
> >
> >
> >
> > Op di 27 aug 2024 om 10:04 schreef Gábor Szádovszky <[email protected]>:
> >
> > > Hi Claire,
> > >
> > > Thanks for bringing this up.
> > >
> > > Since Avro has incompatibilities between these releases (which is
> natural
> > > since the second number of the Avro versions is considered to be the
> > major
> > > one), we only can state compatibility with one if we actually test with
> > it.
> > > So, I would vote on your second proposal or even both.
> > >
> > > Which Avro version do you think we shall support? (I think we need to
> > > support the latest one, and all the major ones below we think are
> > > required.)
> > >
> > > I am not sure if we need separate modules to be actually released for
> the
> > > different Avro versions or this is only required for testing. For the
> > first
> > > case, it'll be quite obvious which Avro version we support since it'll
> be
> > > part of the package naming.
> > >
> > > If you want to invest efforts in this, I am happy to help with
> reviewing.
> > >
> > > Cheers,
> > > Gabor
> > >
> > > Claire McGinty <[email protected]> ezt írta (időpont: 2024.
> > aug.
> > > 26., H, 19:01):
> > >
> > > > Hi all,
> > > >
> > > > I wanted to start a thread discussing Avro cross-version support in
> > > > parquet-java. The parquet-avro module has been on Avro 1.11 since the
> > > 1.13
> > > > release, but since then we've made fixes and added feature support
> for
> > > Avro
> > > > 1.8 APIs (ex1 <https://github.com/apache/parquet-java/pull/2957>,
> ex2
> > > > <https://github.com/apache/parquet-java/pull/2993>).
> > > >
> > > > Mostly the Avro APIs referenced by parquet-avro are
> > > > cross-version-compatible, with a few exceptions:
> > > >
> > > >    - Evolution of Schema constructor APIs
> > > >    - New logical types (i.e., local timestamp and UUID)
> > > >    - Renamed logical type conversion helpers
> > > >    - Generated code for datetime types using Java Time vs Joda Time
> for
> > > >    setters/getters
> > > >
> > > > Some of these are hard to catch when Parquet is compiled and tested
> > with
> > > > Avro 1.11 only. Additionally, as a user who mostly relies on Avro 1.8
> > > > currently, I'm not sure how much longer Parquet will continue to
> > support
> > > > it.
> > > >
> > > > I have two proposals to build confidence and clarity around
> > parquet-avro:
> > > >
> > > >    - Codifying in the parquet-avro documentation
> > > >    <
> > > >
> > >
> >
> https://github.com/apache/parquet-java/blob/master/parquet-avro/README.md>
> > > >    which Avro versions are officially supported and which are
> > > >    deprecated/explicitly not supported
> > > >    - Adding some kind of automated testing with all supported Avro
> > > >    versions. This is a bit tricky because as I mentioned, the
> generated
> > > >    SpecificRecord classes use incompatible logical type APIs across
> > Avro
> > > >    versions, so we'd have to find a way to invoke avro-compiler/load
> > the
> > > > Avro
> > > >    core library for different versions... this would probably
> require a
> > > >    multi-module setup.
> > > >
> > > > I'd love to know what the Parquet community thinks about these ideas.
> > > > Additionally, I'm interested to learn more about what Avro versions
> > other
> > > > Parquet users rely on. Seems like there's a lot of variance across
> the
> > > data
> > > > ecosystem--Spark keeps up-to-date with latest Avro version, Hadoop
> has
> > > Avro
> > > > 1.9 pinned, and Apache Beam used to be tightly coupled with 1.8, but
> > has
> > > > recently refactored to be version-agnostic.
> > > >
> > > > Best,
> > > > Claire
> > > >
> > >
> >
>

Reply via email to