Yea , we discussed a little bit on
https://github.com/apache/iceberg/pull/14984.  We are trying via DSV2 to
minimize internal Spark class use.  But, given the Ci challenges and lack
of confidence, it sounds like Option 1: last LTS + 2 latest minors, is more
likely.  (Ie, we can start thinking about removing Spark 4.0 when adopting
4.2).  Love to hear thoughts if anyone thinks otherwise.

Thanks
Szehon

On Mon, Jun 15, 2026 at 12:12 AM Cheng Pan <[email protected]> wrote:

> > would an Iceberg-Spark runtime jar built for 4.2 also work on Spark 4.5?
>
> I am not optimistic about this unless Iceberg adds a cross-version
> verification workflow - based on the Iceberg Runtime Jar compiled for Spark
> 4.2, running smoke tests on higher versions of Spark distributions. This
> would consume a lot of CI resources, clearly defeating the purpose.
>
> Note, Iceberg uses a lot of Spark internal catalyst API, which has no
> compatibility guarantee, the SPIP only ensures Public API and User Facing
> behavior has no breaking changes across minor releases.
>
> Thanks,
> Cheng Pan
>
>
>
> On Jun 3, 2026, at 14:04, Anurag Mantripragada <
> [email protected]> wrote:
>
> Hi Kurtis, Szehon, and Jingyi,
>
> Kurtis, to your point, we do need to support two major versions (3.5 and
> 4.x) during this transition period. I hope Szehon’s explanation clarified
> your questions regarding API changes and the frequency of Spark minor
> version upgrades moving forward.
>
> Szehon and Jingyi, thank you for providing more context from the Spark
> side. Since CI is becoming a bottleneck, we should restrict support to one
> LTS and two minor versions.
>
> Regarding compatibility: if we have versions 4.1, 4.2, and 4.5 (the LTS
> and final version in the 4.x line), would an Iceberg-Spark runtime jar
> built for 4.2 also work on Spark 4.5?
>
> ~ Anurag
>
>
> On Mon, Jun 1, 2026 at 7:25 PM Szehon Ho <[email protected]> wrote:
>
>> Yes , the policy is that the same Spark major version should now avoid
>> breaking change: https://spark.apache.org/versioning-policy.html.   One
>> additional detail is that LTS is going to be the last minor of every major
>> version release.
>>
>> But there is still the issue.  If Iceberg is eager to consume new Spark
>> API's, then we still need multiple Spark-Iceberg versions.  Ie, if we
>> consume new DSV2 features in Spark 4.3, Iceberg-Spark 4.3 won't run with
>> Spark 4.2.
>>
>> There is some advantage of Spark major version backward compatibility.
>> One Iceberg-Spark version will in theory work with future Spark versions of
>> the same major.  For example, we can have one Iceberg-Spark jar for
>> Iceberg-Spark 4.0 (or any 'selected minor' as Anurag said), and in
>> theory it will work with all future Spark versions in the same major (ie,
>> 4.1, 4.2).  So something like 1. last Spark LTS, 2. first Spark version of
>> current major, 3. last Spark version of current major does increase the
>> coverage, but becomes a bit more complex for users and community (more
>> cross-version testing needed -- needing more CI, and users need to
>> understand it).
>>
>> Thanks,
>> Szehon
>>
>>
>> On Mon, Jun 1, 2026 at 4:52 PM <[email protected]> wrote:
>>
>>> Hi Anurag,
>>>
>>> Thank you for calling this out, TIL about Spark quarterly updates!
>>>
>>> A few naive question, do we need to support more than 2 major Spark
>>> versions in CI?
>>> Is it correct to assume API  interface changes should only happen across
>>> major version updates?
>>> Is the Spark community doing this with the built in assumption that
>>> minor version upgrades will be relatively easy going forward?
>>>
>>> Thank you,
>>> Kurtis C. Wright
>>>
>>> On Jun 1, 2026, at 15:25, Anurag Mantripragada <
>>> [email protected]> wrote:
>>>
>>> 
>>> Hi all,
>>>
>>>
>>>
>>> With Spark 3.4 now removed <http:///> after the 1.11 release, and Spark
>>> community proposing
>>> <https://docs.google.com/document/d/1gBoZ4KH5zQUWpgK3M7zAN7p6Glz4S_e9bO3PvQA9sQs/edit?tab=t.0#heading=h.vj8hviw7ebqz>
>>> quarterly minor releases, I'd like to start a discussion on how Iceberg
>>> should adapt its Spark version support strategy going forward.
>>>
>>> *Where we are today   *
>>>
>>>
>>> On main we support three Spark versions: 3.5, 4.0, and 4.1. Our CI
>>> matrix runs 16 jobs across these which is already becoming a bottleneck
>>> <https://github.com/apache/iceberg/issues/16397>.
>>>
>>>
>>>
>>> Historically, we have deprecated and removed Spark versions in an ad-hoc
>>> fashion. This worked with ~2 Spark minors per year, but with the new
>>> quarterly releases of spark it may not scale.
>>>
>>> As per the Spark SPIP we have this coming next
>>>
>>> Date
>>> Release
>>> Maintenance
>>> Notes
>>> April 2026
>>> 4.2
>>> 6 months
>>> Non-LTS (Past)
>>> July 2026
>>> 4.3
>>> 6 months
>>> Non-LTS
>>> October 2026
>>> 4.4
>>> 6 months
>>> Non-LTS
>>> January 2027
>>> 4.5
>>> 18 months
>>> LTS
>>> April 2027
>>> 5.0
>>> —
>>> Major
>>>
>>> This means 4 Spark minors per year, each with only a 6-month maintenance
>>> window, and an LTS roughly once a year.
>>>
>>> I propose we adopt a policy instead of making ad-hoc decisions. Some
>>> options I see:
>>>
>>>
>>>    1. *LTS + rolling window of 2 minors*: Support the current Spark LTS
>>>    and the 2 most recent minors. For example, when 4.2 GA ships, add it and
>>>    deprecate 4.0 and when 4.3 ships, add it and deprecate 4.1. This provides
>>>    predictable cadence but also means a version add/drop every quarter.
>>>
>>>    2. *LTS + selective minors*:  Support the Spark LTS and choose
>>>    minors that have meaningful DSv2 API changes, skipping versions that are
>>>    incremental. More flexible but less predictable for users. (This is the
>>>    current strategy)
>>>
>>>
>>> Any strategy must account for CI infra ceiling too. Recent improvements
>>> <https://github.com/apache/iceberg/issues/16397> have helped, but I
>>> think we should support at most 3 versions to keep this under control.
>>>
>>>
>>>
>

Reply via email to