Hi all,


With Spark 3.4 now removed </> after the 1.11 release, and Spark community
proposing
<https://docs.google.com/document/d/1gBoZ4KH5zQUWpgK3M7zAN7p6Glz4S_e9bO3PvQA9sQs/edit?tab=t.0#heading=h.vj8hviw7ebqz>
quarterly minor releases, I'd like to start a discussion on how Iceberg
should adapt its Spark version support strategy going forward.

*Where we are today   *


On main we support three Spark versions: 3.5, 4.0, and 4.1. Our CI matrix
runs 16 jobs across these which is already becoming a bottleneck
<https://github.com/apache/iceberg/issues/16397>.



Historically, we have deprecated and removed Spark versions in an ad-hoc
fashion. This worked with ~2 Spark minors per year, but with the new
quarterly releases of spark it may not scale.

As per the Spark SPIP we have this coming next

Date

Release

Maintenance

Notes

April 2026

4.2

6 months

Non-LTS (Past)

July 2026

4.3

6 months

Non-LTS

October 2026

4.4

6 months

Non-LTS

January 2027

4.5

18 months

LTS

April 2027

5.0

—

Major

This means 4 Spark minors per year, each with only a 6-month maintenance
window, and an LTS roughly once a year.

I propose we adopt a policy instead of making ad-hoc decisions. Some
options I see:


   1. *LTS + rolling window of 2 minors*: Support the current Spark LTS and
   the 2 most recent minors. For example, when 4.2 GA ships, add it and
   deprecate 4.0 and when 4.3 ships, add it and deprecate 4.1. This provides
   predictable cadence but also means a version add/drop every quarter.

   2. *LTS + selective minors*:  Support the Spark LTS and choose minors
   that have meaningful DSv2 API changes, skipping versions that are
   incremental. More flexible but less predictable for users. (This is the
   current strategy)


Any strategy must account for CI infra ceiling too. Recent improvements
<https://github.com/apache/iceberg/issues/16397> have helped, but I think
we should support at most 3 versions to keep this under control.

Reply via email to