On Sat, 23 May 2026 at 11:03, Ismaël Mejía <[email protected]> wrote:
> Hi everyone, > > I would like to propose raising the minimum Java version for Apache > Parquet Java from 11 to 17. Java 17 is the current baseline LTS > version and has been out since September 2021 -- nearly four years. > > The broader ecosystem has already converged on Java 17 as the > minimum: > - Apache Spark 4.x requires Java 17 (dropped Java 8/11). > - Apache Flink 2.x uses Java 17 by default and recommends it as the > version to run on. > - Apache Hadoop 3.5+ requires JDK 17 on the server side. > Hadoop. 3.4.3 is JDK17+ client, still a bit patchy on the server. Hadoop 3.5.0 is the pure java17 one. > - Apache Hive 4.2 requires JDK 21 as the minimum. > I think trinio is there too. > > As a library consumed by all of the above, Parquet Java staying on > Java 11 provides diminishing returns while imposing real costs: > 1. Build tooling compatibility. We already hit issues where the > Spotless formatter and Hadoop itself break on newer JDKs due to > removed APIs (Subject.getSubject() in JDK 23+, internal javac > methods in JDK 25). Should be fixed in Hadoop 3.4.3+ ( https://issues.apache.org/jira/browse/HADOOP-19212 ), not aware of others. > Staying on Java 11 as the baseline makes it > harder to test and support the JDK versions our consumers > actually run on. > + complicates the java test matrix in general. > 2. Language and API improvements. Java 17 brings records, sealed > classes, text blocks, pattern matching for instanceof, etc. These > improve code readability and reduce boilerplate. > Text blocks are really good for inline schemas... > 3. Performance. The JVM has had significant improvements in GC > (ZGC, Shenandoah), JIT compilation, and memory layout between > Java 11 and 17. Libraries compiled against a Java 17 baseline can > take advantage of these without workarounds. > 4. Security. Java 11 no longer receives public security > updates. Requiring 17 aligns with the security posture expected > of a widely deployed data format library. > > The first commit on the PR [1] already enforces Java 17 via the Maven > enforcer plugin and removes pre-17 workarounds (reflection hacks for > DirectByteBuffer cleanup, etc.). The full build passes on JDK 17, 21, > and 25. > > I don't expect this to be controversial given where the ecosystem is, > but wanted to give the community a chance to discuss before we merge. > > If parquet 1.18 targets java17 only then that lines up well with spark4, iceberg 10.x., flink 2. But 1. What does spark 3.5 require? Would a java17 requirement cause problems there? 2. Will the project need to make any commitment for security/data integrity releases on a java-11 compatible branch?
