On Sat, 23 May 2026 at 11:03, Ismaël Mejía <[email protected]> wrote:

> Hi everyone,
>
> I would like to propose raising the minimum Java version for Apache
> Parquet Java from 11 to 17. Java 17 is the current baseline LTS
> version and has been out since September 2021 -- nearly four years.
>
> The broader ecosystem has already converged on Java 17 as the
> minimum:
>   - Apache Spark 4.x requires Java 17 (dropped Java 8/11).
>   - Apache Flink 2.x uses Java 17 by default and recommends it as the
>     version to run on.
>   - Apache Hadoop 3.5+ requires JDK 17 on the server side.
>

Hadoop. 3.4.3 is JDK17+ client, still a bit patchy on the server. Hadoop
3.5.0 is the pure java17 one.


>   - Apache Hive 4.2 requires JDK 21 as the minimum.
>

I think trinio is there too.


>
> As a library consumed by all of the above, Parquet Java staying on
> Java 11 provides diminishing returns while imposing real costs:
>   1. Build tooling compatibility. We already hit issues where the
>      Spotless formatter and Hadoop itself break on newer JDKs due to
>      removed APIs (Subject.getSubject() in JDK 23+, internal javac
>      methods in JDK 25).


Should be fixed in Hadoop 3.4.3+ (
https://issues.apache.org/jira/browse/HADOOP-19212 ), not aware of others.


> Staying on Java 11 as the baseline makes it
>      harder to test and support the JDK versions our consumers
>      actually run on.
>

+ complicates the java test matrix in general.


>   2. Language and API improvements. Java 17 brings records, sealed
>      classes, text blocks, pattern matching for instanceof, etc. These
>      improve code readability and reduce boilerplate.
>

Text blocks are really good for inline schemas...


>   3. Performance. The JVM has had significant improvements in GC
>      (ZGC, Shenandoah), JIT compilation, and memory layout between
>      Java 11 and 17. Libraries compiled against a Java 17 baseline can
>      take advantage of these without workarounds.
>   4. Security. Java 11 no longer receives public security
>      updates. Requiring 17 aligns with the security posture expected
>      of a widely deployed data format library.
>
> The first commit on the PR [1] already enforces Java 17 via the Maven
> enforcer plugin and removes pre-17 workarounds (reflection hacks for
> DirectByteBuffer cleanup, etc.). The full build passes on JDK 17, 21,
> and 25.
>
> I don't expect this to be controversial given where the ecosystem is,
> but wanted to give the community a chance to discuss before we merge.
>
>
If parquet 1.18 targets java17 only then that lines up well with spark4,
iceberg 10.x., flink 2.

But

   1. What does spark 3.5 require? Would a java17 requirement cause
   problems there?
   2. Will the project need to make any commitment for security/data
   integrity releases on a java-11 compatible branch?

Reply via email to