Re: Iceberg 1.10.0 release update - July 1, 2025

Steven Wu Thu, 31 Jul 2025 10:19:51 -0700

Currently, the 1.10.0 milestone have no open PRs
https://github.com/apache/iceberg/milestone/54


The variant PR was merged this and last week. There are still some variant
testing related PRs, which are probably not blockers for 1.10.0 release.
* Spark variant read: https://github.com/apache/iceberg/pull/13219
* use short strings: https://github.com/apache/iceberg/pull/13284

We are still waiting for the following two changes
* Anton's fix for the data frame join using the same snapshot, which will
introduce a slight behavior change in spark 4.0.
* unknown type support. Fokko raised a discussion thread on a blocking
issue.

Anything else did I miss?



On Sat, Jul 26, 2025 at 5:52 AM Fokko Driesprong <[email protected]> wrote:

> Hey all,
>
> The read path for the UnknownType needs some community discussion. I've
> raised a separate thread
> <https://lists.apache.org/thread/gq9lyndb574ptq7vkz83zgkp1lx7vp5x>. PTAL
>
> Kind regards from Belgium,
> Fokko
>
> Op za 26 jul 2025 om 00:58 schreef Ryan Blue <[email protected]>:
>
>> I thought that we said we wanted to get support out for v3 features in
>> this release unless there is some reasonable blocker, like Spark not having
>> geospatial types. To me, I think that means we should aim to get variant
>> and unknown done so that we have a complete implementation with a major
>> engine. And it should not be particularly difficult to get unknown done so
>> I'd opt to get it in.
>>
>> On Fri, Jul 25, 2025 at 11:28 AM Steven Wu <[email protected]> wrote:
>>
>>> > I believe we also wanted to get in at least the read path for
>>> UnknownType. Fokko has a WIP PR
>>> <https://github.com/apache/iceberg/pull/13445> for that.
>>> I thought in the community sync the consensus is that this is not a
>>> blocker, because it is a new feature implementation. If it is ready, it
>>> will be included.
>>>
>>> On Fri, Jul 25, 2025 at 9:43 AM Kevin Liu <[email protected]> wrote:
>>>
>>>> I think Fokko's OOO. Should we help with that PR?
>>>>
>>>> On Fri, Jul 25, 2025 at 9:38 AM Eduard Tudenhöfner <
>>>> [email protected]> wrote:
>>>>
>>>>> I believe we also wanted to get in at least the read path for
>>>>> UnknownType. Fokko has a WIP PR
>>>>> <https://github.com/apache/iceberg/pull/13445> for that.
>>>>>
>>>>> On Fri, Jul 25, 2025 at 6:13 PM Steven Wu <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> 3. Spark: fix data frame join based on different versions of the same
>>>>>> table that may lead to weird results. Anton is working on a fix. It
>>>>>> requires a small behavior change (table state may be stale up to refresh
>>>>>> interval). Hence it is better to include it in the 1.10.0 release where
>>>>>> Spark 4.0 is first supported.
>>>>>> 4. Variant support in core and Spark 4.0. Ryan thinks this is very
>>>>>> close and will prioritize the review.
>>>>>>
>>>>>> We still have the above two issues pending. 3 doesn't have a PR yet.
>>>>>> PR for 4 is not associated with the milestone yet.
>>>>>>
>>>>>> On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks everyone for the review. The 2 PRs are both merged.
>>>>>>> Looks like there's only 1 PR left in the 1.10 milestone
>>>>>>> <https://github.com/apache/iceberg/milestone/54> :)
>>>>>>>
>>>>>>> Best,
>>>>>>> Kevin Liu
>>>>>>>
>>>>>>> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Kevin. The first change is not in the versioned doc so it
>>>>>>>> can be released anytime.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Manu
>>>>>>>>
>>>>>>>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> The 3 PRs above are merged. Thanks everyone for the review.
>>>>>>>>>
>>>>>>>>> I've added 2 more PRs to the 1.10 milestone. These are both
>>>>>>>>> nice-to-haves.
>>>>>>>>> - docs: add subpage for REST Catalog Spec in "Specification"
>>>>>>>>> #13521 <https://github.com/apache/iceberg/pull/13521>
>>>>>>>>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest
>>>>>>>>> fixture #13599 <https://github.com/apache/iceberg/pull/13599>
>>>>>>>>>
>>>>>>>>> The first one changes the link for "REST Catalog Spec" on the left
>>>>>>>>> nav of https://iceberg.apache.org/spec/ from the swagger.io link
>>>>>>>>> to a dedicated page for IRC.
>>>>>>>>> The second one fixes the default behavior of
>>>>>>>>> `iceberg-rest-fixture` image to align with the general expectation 
>>>>>>>>> when
>>>>>>>>> creating a table in a catalog.
>>>>>>>>>
>>>>>>>>> Please take a look. I would like to have both of these as part of
>>>>>>>>> the 1.10 release.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Kevin Liu
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Here are the 3 PRs to add corresponding tests.
>>>>>>>>>> https://github.com/apache/iceberg/pull/13648
>>>>>>>>>> https://github.com/apache/iceberg/pull/13649
>>>>>>>>>> https://github.com/apache/iceberg/pull/13650
>>>>>>>>>>
>>>>>>>>>> I've tagged them with the 1.10 milestone, waiting for CI to
>>>>>>>>>> complete :)
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Kevin Liu
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Kevin, thanks for checking that. I will take a look at your
>>>>>>>>>>> backport PRs. Can you add them to the 1.10.0 milestone?
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks again for driving this Steven! We're very close!!
>>>>>>>>>>>>
>>>>>>>>>>>> As mentioned in the community sync today, I wanted to verify
>>>>>>>>>>>> feature parity between Spark 3.5 and Spark 4.0 for this release.
>>>>>>>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have feature
>>>>>>>>>>>> parity for this upcoming release. More details in the other 
>>>>>>>>>>>> devlist thread
>>>>>>>>>>>> https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Kevin Liu
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Another update on the release.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The existing blocker PRs are almost done.
>>>>>>>>>>>>>
>>>>>>>>>>>>> During today's community sync, we identified the following
>>>>>>>>>>>>> issues/PRs to be included in the 1.10.0 release.
>>>>>>>>>>>>>
>>>>>>>>>>>>>    1. backport of PR 13100 to the main branch. I have created
>>>>>>>>>>>>>    a cherry-pick PR
>>>>>>>>>>>>>    <https://github.com/apache/iceberg/pull/13647> for that.
>>>>>>>>>>>>>    There is a one line difference compared to the original PR due 
>>>>>>>>>>>>> to the
>>>>>>>>>>>>>    removal of the deprecated RemoveSnapshot class in main branch 
>>>>>>>>>>>>> for 1.10.0
>>>>>>>>>>>>>    target. Amogh has suggested using RemoveSnapshots with a 
>>>>>>>>>>>>> single snapshot
>>>>>>>>>>>>>    id, which should be supported by all REST catalog servers.
>>>>>>>>>>>>>    2. Flink compaction doesn't support row lineage. Fail the
>>>>>>>>>>>>>    compaction for V3 tables. I created a PR
>>>>>>>>>>>>>    <https://github.com/apache/iceberg/pull/13646> for that.
>>>>>>>>>>>>>    Will backport after it is merged.
>>>>>>>>>>>>>    3. Spark: fix data frame join based on different versions
>>>>>>>>>>>>>    of the same table that may lead to weird results. Anton is 
>>>>>>>>>>>>> working on a
>>>>>>>>>>>>>    fix. It requires a small behavior change (table state may be 
>>>>>>>>>>>>> stale up to
>>>>>>>>>>>>>    refresh interval). Hence it is better to include it in the 
>>>>>>>>>>>>> 1.10.0 release
>>>>>>>>>>>>>    where Spark 4.0 is first supported.
>>>>>>>>>>>>>    4. Variant support in core and Spark 4.0. Ryan thinks this
>>>>>>>>>>>>>    is very close and will prioritize the review.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> steven
>>>>>>>>>>>>>
>>>>>>>>>>>>> The 1.10.0 milestone can be found here.
>>>>>>>>>>>>> https://github.com/apache/iceberg/milestone/54
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ajantha/Robin, thanks for the note. We can include the PR in
>>>>>>>>>>>>>> the 1.10.0 milestone.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt
>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent point of
>>>>>>>>>>>>>>> view, we will not be able to publish the connector on Confluent 
>>>>>>>>>>>>>>> Hub until
>>>>>>>>>>>>>>> this CVE[1] is fixed.
>>>>>>>>>>>>>>> Since we would not publish a snapshot build, if the fix
>>>>>>>>>>>>>>> doesn't make it into 1.10 then we'd have to wait for 1.11 (or a 
>>>>>>>>>>>>>>> dot release
>>>>>>>>>>>>>>> of 1.10) to be able to include the connector on Confluent Hub.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks, Robin.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have approached Confluent people
>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281>
>>>>>>>>>>>>>>>> to help us publish the OSS Kafka Connect Iceberg sink plugin.
>>>>>>>>>>>>>>>> It seems we have a CVE from dependency that blocks us from
>>>>>>>>>>>>>>>> publishing the plugin.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please include the below PR for 1.10.0 release which fixes
>>>>>>>>>>>>>>>> that.
>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/13561
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> > Engines may model operations as deleting/inserting rows
>>>>>>>>>>>>>>>>> or as modifications to rows that preserve row ids.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Manu, I agree this sentence probably lacks some context.
>>>>>>>>>>>>>>>>> The first half (as deleting/inserting rows) is probably
>>>>>>>>>>>>>>>>> about the row lineage handling with equality deletes, which 
>>>>>>>>>>>>>>>>> is described in
>>>>>>>>>>>>>>>>> another place.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> "Row lineage does not track lineage for rows updated via 
>>>>>>>>>>>>>>>>> Equality
>>>>>>>>>>>>>>>>> Deletes
>>>>>>>>>>>>>>>>> <https://iceberg.apache.org/spec/#equality-delete-files>,
>>>>>>>>>>>>>>>>> because engines using equality deletes avoid reading existing 
>>>>>>>>>>>>>>>>> data before
>>>>>>>>>>>>>>>>> writing changes and can't provide the original row ID for the 
>>>>>>>>>>>>>>>>> new rows.
>>>>>>>>>>>>>>>>> These updates are always treated as if the existing row was 
>>>>>>>>>>>>>>>>> completely
>>>>>>>>>>>>>>>>> removed and a unique new row was added."
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks Steven, I missed that part but the following
>>>>>>>>>>>>>>>>>> sentence is a bit hard to understand (maybe just me)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Engines may model operations as deleting/inserting rows
>>>>>>>>>>>>>>>>>> or as modifications to rows that preserve row ids.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Can you please help to explain?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Steven Wu <[email protected]>于2025年7月15日 周二04:41写道：
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Manu
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The spec already covers the row lineage carry over (for
>>>>>>>>>>>>>>>>>>> replace)
>>>>>>>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> "When an existing row is moved to a different data file
>>>>>>>>>>>>>>>>>>> for any reason, writers should write _row_id and
>>>>>>>>>>>>>>>>>>> _last_updated_sequence_number according to the
>>>>>>>>>>>>>>>>>>> following rules:"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> another update on the release.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone
>>>>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/milestone/54> (with
>>>>>>>>>>>>>>>>>>>> 25 closed PRs). Amogh is actively working on the last 
>>>>>>>>>>>>>>>>>>>> blocker PR.
>>>>>>>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on
>>>>>>>>>>>>>>>>>>>> compaction
>>>>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/13555>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I will publish a release candidate after the above
>>>>>>>>>>>>>>>>>>>> blocker is merged and backported.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi Amogh,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Is it defined in the table spec that "replace"
>>>>>>>>>>>>>>>>>>>>> operation should carry over existing lineage info 
>>>>>>>>>>>>>>>>>>>>> insteading of assigning
>>>>>>>>>>>>>>>>>>>>> new IDs? If not, we'd better firstly define it in spec 
>>>>>>>>>>>>>>>>>>>>> because all engines
>>>>>>>>>>>>>>>>>>>>> and implementations need to follow it.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> One other area I think we need to make sure works
>>>>>>>>>>>>>>>>>>>>>> with row lineage before release is data file compaction. 
>>>>>>>>>>>>>>>>>>>>>> At
>>>>>>>>>>>>>>>>>>>>>> the moment,
>>>>>>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44>
>>>>>>>>>>>>>>>>>>>>>>  it
>>>>>>>>>>>>>>>>>>>>>> looks like compaction will read the records from the 
>>>>>>>>>>>>>>>>>>>>>> data files without
>>>>>>>>>>>>>>>>>>>>>> projecting the lineage fields. What this means is that 
>>>>>>>>>>>>>>>>>>>>>> on write of the new
>>>>>>>>>>>>>>>>>>>>>> compacted data files we'd be losing the lineage 
>>>>>>>>>>>>>>>>>>>>>> information. There's no
>>>>>>>>>>>>>>>>>>>>>> data change in a compaction but we do need to make sure 
>>>>>>>>>>>>>>>>>>>>>> the lineage info
>>>>>>>>>>>>>>>>>>>>>> from carried over records is materialized in the newly 
>>>>>>>>>>>>>>>>>>>>>> compacted files so
>>>>>>>>>>>>>>>>>>>>>> they don't get new IDs or inherit the new file sequence 
>>>>>>>>>>>>>>>>>>>>>> number. I'm working
>>>>>>>>>>>>>>>>>>>>>> on addressing this as well, but I'd call this out as a 
>>>>>>>>>>>>>>>>>>>>>> blocker as well.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> *Robin Moffatt*
>>>>>>>>>>>>>>> *Sr. Principal Advisor, Streaming Data Technologies*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>

Re: Iceberg 1.10.0 release update - July 1, 2025

Reply via email to