I think Fokko's OOO. Should we help with that PR? On Fri, Jul 25, 2025 at 9:38 AM Eduard Tudenhöfner <etudenhoef...@apache.org> wrote:
> I believe we also wanted to get in at least the read path for UnknownType. > Fokko has a WIP PR <https://github.com/apache/iceberg/pull/13445> for > that. > > On Fri, Jul 25, 2025 at 6:13 PM Steven Wu <stevenz...@gmail.com> wrote: > >> 3. Spark: fix data frame join based on different versions of the same >> table that may lead to weird results. Anton is working on a fix. It >> requires a small behavior change (table state may be stale up to refresh >> interval). Hence it is better to include it in the 1.10.0 release where >> Spark 4.0 is first supported. >> 4. Variant support in core and Spark 4.0. Ryan thinks this is very close >> and will prioritize the review. >> >> We still have the above two issues pending. 3 doesn't have a PR yet. PR >> for 4 is not associated with the milestone yet. >> >> On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu <kevinjq...@apache.org> wrote: >> >>> Thanks everyone for the review. The 2 PRs are both merged. >>> Looks like there's only 1 PR left in the 1.10 milestone >>> <https://github.com/apache/iceberg/milestone/54> :) >>> >>> Best, >>> Kevin Liu >>> >>> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang <owenzhang1...@gmail.com> >>> wrote: >>> >>>> Thanks Kevin. The first change is not in the versioned doc so it can be >>>> released anytime. >>>> >>>> Regards, >>>> Manu >>>> >>>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu <kevinjq...@apache.org> >>>> wrote: >>>> >>>>> The 3 PRs above are merged. Thanks everyone for the review. >>>>> >>>>> I've added 2 more PRs to the 1.10 milestone. These are both >>>>> nice-to-haves. >>>>> - docs: add subpage for REST Catalog Spec in "Specification" #13521 >>>>> <https://github.com/apache/iceberg/pull/13521> >>>>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest fixture >>>>> #13599 <https://github.com/apache/iceberg/pull/13599> >>>>> >>>>> The first one changes the link for "REST Catalog Spec" on the left nav >>>>> of https://iceberg.apache.org/spec/ from the swagger.io link to a >>>>> dedicated page for IRC. >>>>> The second one fixes the default behavior of `iceberg-rest-fixture` >>>>> image to align with the general expectation when creating a table in a >>>>> catalog. >>>>> >>>>> Please take a look. I would like to have both of these as part of the >>>>> 1.10 release. >>>>> >>>>> Best, >>>>> Kevin Liu >>>>> >>>>> >>>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu <kevinjq...@apache.org> >>>>> wrote: >>>>> >>>>>> Here are the 3 PRs to add corresponding tests. >>>>>> https://github.com/apache/iceberg/pull/13648 >>>>>> https://github.com/apache/iceberg/pull/13649 >>>>>> https://github.com/apache/iceberg/pull/13650 >>>>>> >>>>>> I've tagged them with the 1.10 milestone, waiting for CI to complete >>>>>> :) >>>>>> >>>>>> Best, >>>>>> Kevin Liu >>>>>> >>>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu <stevenz...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Kevin, thanks for checking that. I will take a look at your backport >>>>>>> PRs. Can you add them to the 1.10.0 milestone? >>>>>>> >>>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu <kevinjq...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks again for driving this Steven! We're very close!! >>>>>>>> >>>>>>>> As mentioned in the community sync today, I wanted to verify >>>>>>>> feature parity between Spark 3.5 and Spark 4.0 for this release. >>>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have feature >>>>>>>> parity for this upcoming release. More details in the other devlist >>>>>>>> thread >>>>>>>> https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Kevin Liu >>>>>>>> >>>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <stevenz...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Another update on the release. >>>>>>>>> >>>>>>>>> The existing blocker PRs are almost done. >>>>>>>>> >>>>>>>>> During today's community sync, we identified the following >>>>>>>>> issues/PRs to be included in the 1.10.0 release. >>>>>>>>> >>>>>>>>> 1. backport of PR 13100 to the main branch. I have created a >>>>>>>>> cherry-pick >>>>>>>>> PR <https://github.com/apache/iceberg/pull/13647> for that. >>>>>>>>> There is a one line difference compared to the original PR due to >>>>>>>>> the >>>>>>>>> removal of the deprecated RemoveSnapshot class in main branch for >>>>>>>>> 1.10.0 >>>>>>>>> target. Amogh has suggested using RemoveSnapshots with a single >>>>>>>>> snapshot >>>>>>>>> id, which should be supported by all REST catalog servers. >>>>>>>>> 2. Flink compaction doesn't support row lineage. Fail the >>>>>>>>> compaction for V3 tables. I created a PR >>>>>>>>> <https://github.com/apache/iceberg/pull/13646> for that. Will >>>>>>>>> backport after it is merged. >>>>>>>>> 3. Spark: fix data frame join based on different versions of >>>>>>>>> the same table that may lead to weird results. Anton is working on >>>>>>>>> a fix. >>>>>>>>> It requires a small behavior change (table state may be stale up >>>>>>>>> to refresh >>>>>>>>> interval). Hence it is better to include it in the 1.10.0 release >>>>>>>>> where >>>>>>>>> Spark 4.0 is first supported. >>>>>>>>> 4. Variant support in core and Spark 4.0. Ryan thinks this is >>>>>>>>> very close and will prioritize the review. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> steven >>>>>>>>> >>>>>>>>> The 1.10.0 milestone can be found here. >>>>>>>>> https://github.com/apache/iceberg/milestone/54 >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <stevenz...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Ajantha/Robin, thanks for the note. We can include the PR in the >>>>>>>>>> 1.10.0 milestone. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt >>>>>>>>>> <ro...@confluent.io.invalid> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent point of view, >>>>>>>>>>> we will not be able to publish the connector on Confluent Hub until >>>>>>>>>>> this >>>>>>>>>>> CVE[1] is fixed. >>>>>>>>>>> Since we would not publish a snapshot build, if the fix doesn't >>>>>>>>>>> make it into 1.10 then we'd have to wait for 1.11 (or a dot release >>>>>>>>>>> of >>>>>>>>>>> 1.10) to be able to include the connector on Confluent Hub. >>>>>>>>>>> >>>>>>>>>>> Thanks, Robin. >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861 >>>>>>>>>>> >>>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat < >>>>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> I have approached Confluent people >>>>>>>>>>>> <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281> >>>>>>>>>>>> to help us publish the OSS Kafka Connect Iceberg sink plugin. >>>>>>>>>>>> It seems we have a CVE from dependency that blocks us from >>>>>>>>>>>> publishing the plugin. >>>>>>>>>>>> >>>>>>>>>>>> Please include the below PR for 1.10.0 release which fixes >>>>>>>>>>>> that. >>>>>>>>>>>> https://github.com/apache/iceberg/pull/13561 >>>>>>>>>>>> >>>>>>>>>>>> - Ajantha >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu < >>>>>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> > Engines may model operations as deleting/inserting rows or >>>>>>>>>>>>> as modifications to rows that preserve row ids. >>>>>>>>>>>>> >>>>>>>>>>>>> Manu, I agree this sentence probably lacks some context. The >>>>>>>>>>>>> first half (as deleting/inserting rows) is probably about the >>>>>>>>>>>>> row lineage handling with equality deletes, which is described in >>>>>>>>>>>>> another >>>>>>>>>>>>> place. >>>>>>>>>>>>> >>>>>>>>>>>>> "Row lineage does not track lineage for rows updated via Equality >>>>>>>>>>>>> Deletes >>>>>>>>>>>>> <https://iceberg.apache.org/spec/#equality-delete-files>, >>>>>>>>>>>>> because engines using equality deletes avoid reading existing >>>>>>>>>>>>> data before >>>>>>>>>>>>> writing changes and can't provide the original row ID for the new >>>>>>>>>>>>> rows. >>>>>>>>>>>>> These updates are always treated as if the existing row was >>>>>>>>>>>>> completely >>>>>>>>>>>>> removed and a unique new row was added." >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang < >>>>>>>>>>>>> owenzhang1...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks Steven, I missed that part but the following sentence >>>>>>>>>>>>>> is a bit hard to understand (maybe just me) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Engines may model operations as deleting/inserting rows or as >>>>>>>>>>>>>> modifications to rows that preserve row ids. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Can you please help to explain? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Steven Wu <stevenz...@gmail.com>于2025年7月15日 周二04:41写道: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Manu >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The spec already covers the row lineage carry over (for >>>>>>>>>>>>>>> replace) >>>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "When an existing row is moved to a different data file for >>>>>>>>>>>>>>> any reason, writers should write _row_id and >>>>>>>>>>>>>>> _last_updated_sequence_number according to the following >>>>>>>>>>>>>>> rules:" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Steven >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu < >>>>>>>>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> another update on the release. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone >>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/milestone/54> (with 25 >>>>>>>>>>>>>>>> closed PRs). Amogh is actively working on the last blocker PR. >>>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on compaction >>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/13555> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I will publish a release candidate after the above blocker >>>>>>>>>>>>>>>> is merged and backported. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Steven >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang < >>>>>>>>>>>>>>>> owenzhang1...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Amogh, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is it defined in the table spec that "replace" operation >>>>>>>>>>>>>>>>> should carry over existing lineage info insteading of >>>>>>>>>>>>>>>>> assigning new IDs? If >>>>>>>>>>>>>>>>> not, we'd better firstly define it in spec because all >>>>>>>>>>>>>>>>> engines and >>>>>>>>>>>>>>>>> implementations need to follow it. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar < >>>>>>>>>>>>>>>>> 2am...@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> One other area I think we need to make sure works with >>>>>>>>>>>>>>>>>> row lineage before release is data file compaction. At >>>>>>>>>>>>>>>>>> the moment, >>>>>>>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44> >>>>>>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>> looks like compaction will read the records from the data >>>>>>>>>>>>>>>>>> files without >>>>>>>>>>>>>>>>>> projecting the lineage fields. What this means is that on >>>>>>>>>>>>>>>>>> write of the new >>>>>>>>>>>>>>>>>> compacted data files we'd be losing the lineage information. >>>>>>>>>>>>>>>>>> There's no >>>>>>>>>>>>>>>>>> data change in a compaction but we do need to make sure the >>>>>>>>>>>>>>>>>> lineage info >>>>>>>>>>>>>>>>>> from carried over records is materialized in the newly >>>>>>>>>>>>>>>>>> compacted files so >>>>>>>>>>>>>>>>>> they don't get new IDs or inherit the new file sequence >>>>>>>>>>>>>>>>>> number. I'm working >>>>>>>>>>>>>>>>>> on addressing this as well, but I'd call this out as a >>>>>>>>>>>>>>>>>> blocker as well. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> *Robin Moffatt* >>>>>>>>>>> *Sr. Principal Advisor, Streaming Data Technologies* >>>>>>>>>>> >>>>>>>>>>