Re: Iceberg 1.10.0 release update - July 1, 2025

Steven Wu Fri, 25 Jul 2025 09:12:32 -0700

3. Spark: fix data frame join based on different versions of the same table
that may lead to weird results. Anton is working on a fix. It requires
a small behavior change (table state may be stale up to refresh interval).
Hence it is better to include it in the 1.10.0 release where Spark 4.0 is
first supported.
4. Variant support in core and Spark 4.0. Ryan thinks this is very close
and will prioritize the review.


We still have the above two issues pending. 3 doesn't have a PR yet. PR for
4 is not associated with the milestone yet.

On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu <[email protected]> wrote:

> Thanks everyone for the review. The 2 PRs are both merged.
> Looks like there's only 1 PR left in the 1.10 milestone
> <https://github.com/apache/iceberg/milestone/54> :)
>
> Best,
> Kevin Liu
>
> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang <[email protected]>
> wrote:
>
>> Thanks Kevin. The first change is not in the versioned doc so it can be
>> released anytime.
>>
>> Regards,
>> Manu
>>
>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu <[email protected]> wrote:
>>
>>> The 3 PRs above are merged. Thanks everyone for the review.
>>>
>>> I've added 2 more PRs to the 1.10 milestone. These are both
>>> nice-to-haves.
>>> - docs: add subpage for REST Catalog Spec in "Specification" #13521
>>> <https://github.com/apache/iceberg/pull/13521>
>>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest fixture
>>> #13599 <https://github.com/apache/iceberg/pull/13599>
>>>
>>> The first one changes the link for "REST Catalog Spec" on the left nav
>>> of https://iceberg.apache.org/spec/ from the swagger.io link to a
>>> dedicated page for IRC.
>>> The second one fixes the default behavior of `iceberg-rest-fixture`
>>> image to align with the general expectation when creating a table in a
>>> catalog.
>>>
>>> Please take a look. I would like to have both of these as part of the
>>> 1.10 release.
>>>
>>> Best,
>>> Kevin Liu
>>>
>>>
>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu <[email protected]> wrote:
>>>
>>>> Here are the 3 PRs to add corresponding tests.
>>>> https://github.com/apache/iceberg/pull/13648
>>>> https://github.com/apache/iceberg/pull/13649
>>>> https://github.com/apache/iceberg/pull/13650
>>>>
>>>> I've tagged them with the 1.10 milestone, waiting for CI to complete :)
>>>>
>>>> Best,
>>>> Kevin Liu
>>>>
>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu <[email protected]> wrote:
>>>>
>>>>> Kevin, thanks for checking that. I will take a look at your backport
>>>>> PRs. Can you add them to the 1.10.0 milestone?
>>>>>
>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thanks again for driving this Steven! We're very close!!
>>>>>>
>>>>>> As mentioned in the community sync today, I wanted to verify feature
>>>>>> parity between Spark 3.5 and Spark 4.0 for this release.
>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have feature parity
>>>>>> for this upcoming release. More details in the other devlist thread
>>>>>> https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f
>>>>>>
>>>>>> Thanks,
>>>>>> Kevin Liu
>>>>>>
>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Another update on the release.
>>>>>>>
>>>>>>> The existing blocker PRs are almost done.
>>>>>>>
>>>>>>> During today's community sync, we identified the following
>>>>>>> issues/PRs to be included in the 1.10.0 release.
>>>>>>>
>>>>>>>    1. backport of PR 13100 to the main branch. I have created a 
>>>>>>> cherry-pick
>>>>>>>    PR <https://github.com/apache/iceberg/pull/13647> for that.
>>>>>>>    There is a one line difference compared to the original PR due to the
>>>>>>>    removal of the deprecated RemoveSnapshot class in main branch for 
>>>>>>> 1.10.0
>>>>>>>    target. Amogh has suggested using RemoveSnapshots with a single 
>>>>>>> snapshot
>>>>>>>    id, which should be supported by all REST catalog servers.
>>>>>>>    2. Flink compaction doesn't support row lineage. Fail the
>>>>>>>    compaction for V3 tables. I created a PR
>>>>>>>    <https://github.com/apache/iceberg/pull/13646> for that. Will
>>>>>>>    backport after it is merged.
>>>>>>>    3. Spark: fix data frame join based on different versions of the
>>>>>>>    same table that may lead to weird results. Anton is working on a 
>>>>>>> fix. It
>>>>>>>    requires a small behavior change (table state may be stale up to 
>>>>>>> refresh
>>>>>>>    interval). Hence it is better to include it in the 1.10.0 release 
>>>>>>> where
>>>>>>>    Spark 4.0 is first supported.
>>>>>>>    4. Variant support in core and Spark 4.0. Ryan thinks this is
>>>>>>>    very close and will prioritize the review.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> steven
>>>>>>>
>>>>>>> The 1.10.0 milestone can be found here.
>>>>>>> https://github.com/apache/iceberg/milestone/54
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Ajantha/Robin, thanks for the note. We can include the PR in the
>>>>>>>> 1.10.0 milestone.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt
>>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent point of view,
>>>>>>>>> we will not be able to publish the connector on Confluent Hub until 
>>>>>>>>> this
>>>>>>>>> CVE[1] is fixed.
>>>>>>>>> Since we would not publish a snapshot build, if the fix doesn't
>>>>>>>>> make it into 1.10 then we'd have to wait for 1.11 (or a dot release of
>>>>>>>>> 1.10) to be able to include the connector on Confluent Hub.
>>>>>>>>>
>>>>>>>>> Thanks, Robin.
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861
>>>>>>>>>
>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I have approached Confluent people
>>>>>>>>>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281>
>>>>>>>>>> to help us publish the OSS Kafka Connect Iceberg sink plugin.
>>>>>>>>>> It seems we have a CVE from dependency that blocks us from
>>>>>>>>>> publishing the plugin.
>>>>>>>>>>
>>>>>>>>>> Please include the below PR for 1.10.0 release which fixes that.
>>>>>>>>>> https://github.com/apache/iceberg/pull/13561
>>>>>>>>>>
>>>>>>>>>> - Ajantha
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> > Engines may model operations as deleting/inserting rows or as
>>>>>>>>>>> modifications to rows that preserve row ids.
>>>>>>>>>>>
>>>>>>>>>>> Manu, I agree this sentence probably lacks some context. The
>>>>>>>>>>> first half (as deleting/inserting rows) is probably about the
>>>>>>>>>>> row lineage handling with equality deletes, which is described in 
>>>>>>>>>>> another
>>>>>>>>>>> place.
>>>>>>>>>>>
>>>>>>>>>>> "Row lineage does not track lineage for rows updated via Equality
>>>>>>>>>>> Deletes <https://iceberg.apache.org/spec/#equality-delete-files>,
>>>>>>>>>>> because engines using equality deletes avoid reading existing data 
>>>>>>>>>>> before
>>>>>>>>>>> writing changes and can't provide the original row ID for the new 
>>>>>>>>>>> rows.
>>>>>>>>>>> These updates are always treated as if the existing row was 
>>>>>>>>>>> completely
>>>>>>>>>>> removed and a unique new row was added."
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks Steven, I missed that part but the following sentence is
>>>>>>>>>>>> a bit hard to understand (maybe just me)
>>>>>>>>>>>>
>>>>>>>>>>>> Engines may model operations as deleting/inserting rows or as
>>>>>>>>>>>> modifications to rows that preserve row ids.
>>>>>>>>>>>>
>>>>>>>>>>>> Can you please help to explain?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Steven Wu <[email protected]>于2025年7月15日 周二04:41写道：
>>>>>>>>>>>>
>>>>>>>>>>>>> Manu
>>>>>>>>>>>>>
>>>>>>>>>>>>> The spec already covers the row lineage carry over (for
>>>>>>>>>>>>> replace)
>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage
>>>>>>>>>>>>>
>>>>>>>>>>>>> "When an existing row is moved to a different data file for
>>>>>>>>>>>>> any reason, writers should write _row_id and
>>>>>>>>>>>>> _last_updated_sequence_number according to the following
>>>>>>>>>>>>> rules:"
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> another update on the release.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone
>>>>>>>>>>>>>> <https://github.com/apache/iceberg/milestone/54> (with 25
>>>>>>>>>>>>>> closed PRs). Amogh is actively working on the last blocker PR.
>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on compaction
>>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/13555>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I will publish a release candidate after the above blocker is
>>>>>>>>>>>>>> merged and backported.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Amogh,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is it defined in the table spec that "replace" operation
>>>>>>>>>>>>>>> should carry over existing lineage info insteading of assigning 
>>>>>>>>>>>>>>> new IDs? If
>>>>>>>>>>>>>>> not, we'd better firstly define it in spec because all engines 
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> implementations need to follow it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> One other area I think we need to make sure works with row
>>>>>>>>>>>>>>>> lineage before release is data file compaction. At the
>>>>>>>>>>>>>>>> moment,
>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44>
>>>>>>>>>>>>>>>>  it
>>>>>>>>>>>>>>>> looks like compaction will read the records from the data 
>>>>>>>>>>>>>>>> files without
>>>>>>>>>>>>>>>> projecting the lineage fields. What this means is that on 
>>>>>>>>>>>>>>>> write of the new
>>>>>>>>>>>>>>>> compacted data files we'd be losing the lineage information. 
>>>>>>>>>>>>>>>> There's no
>>>>>>>>>>>>>>>> data change in a compaction but we do need to make sure the 
>>>>>>>>>>>>>>>> lineage info
>>>>>>>>>>>>>>>> from carried over records is materialized in the newly 
>>>>>>>>>>>>>>>> compacted files so
>>>>>>>>>>>>>>>> they don't get new IDs or inherit the new file sequence 
>>>>>>>>>>>>>>>> number. I'm working
>>>>>>>>>>>>>>>> on addressing this as well, but I'd call this out as a blocker 
>>>>>>>>>>>>>>>> as well.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Robin Moffatt*
>>>>>>>>> *Sr. Principal Advisor, Streaming Data Technologies*
>>>>>>>>>
>>>>>>>>

Re: Iceberg 1.10.0 release update - July 1, 2025

Reply via email to