edited the subject line as we are into August. We are still waiting for the following two changes for the 1.10.0 release * Anton's fix for the data frame join using the same snapshot, which will introduce a slight behavior change in spark 4.0. * unknown type support.
On Fri, Aug 1, 2025 at 6:56 AM Alexandre Dutra <adu...@apache.org> wrote: > Hi Steven, > > A small regression with S3 signing has been reported to me. The fix is > simple: > > https://github.com/apache/iceberg/pull/13718 > > Would it be still possible to have it in 1.10 please? > > Thanks, > Alex > > > On Thu, Jul 31, 2025 at 7:19 PM Steven Wu <stevenz...@gmail.com> wrote: > > > > Currently, the 1.10.0 milestone have no open PRs > > https://github.com/apache/iceberg/milestone/54 > > > > The variant PR was merged this and last week. There are still some > variant testing related PRs, which are probably not blockers for 1.10.0 > release. > > * Spark variant read: https://github.com/apache/iceberg/pull/13219 > > * use short strings: https://github.com/apache/iceberg/pull/13284 > > > > We are still waiting for the following two changes > > * Anton's fix for the data frame join using the same snapshot, which > will introduce a slight behavior change in spark 4.0. > > * unknown type support. Fokko raised a discussion thread on a blocking > issue. > > > > Anything else did I miss? > > > > > > > > On Sat, Jul 26, 2025 at 5:52 AM Fokko Driesprong <fo...@apache.org> > wrote: > >> > >> Hey all, > >> > >> The read path for the UnknownType needs some community discussion. I've > raised a separate thread. PTAL > >> > >> Kind regards from Belgium, > >> Fokko > >> > >> Op za 26 jul 2025 om 00:58 schreef Ryan Blue <rdb...@gmail.com>: > >>> > >>> I thought that we said we wanted to get support out for v3 features in > this release unless there is some reasonable blocker, like Spark not having > geospatial types. To me, I think that means we should aim to get variant > and unknown done so that we have a complete implementation with a major > engine. And it should not be particularly difficult to get unknown done so > I'd opt to get it in. > >>> > >>> On Fri, Jul 25, 2025 at 11:28 AM Steven Wu <stevenz...@gmail.com> > wrote: > >>>> > >>>> > I believe we also wanted to get in at least the read path for > UnknownType. Fokko has a WIP PR for that. > >>>> I thought in the community sync the consensus is that this is not a > blocker, because it is a new feature implementation. If it is ready, it > will be included. > >>>> > >>>> On Fri, Jul 25, 2025 at 9:43 AM Kevin Liu <kevinjq...@apache.org> > wrote: > >>>>> > >>>>> I think Fokko's OOO. Should we help with that PR? > >>>>> > >>>>> On Fri, Jul 25, 2025 at 9:38 AM Eduard Tudenhöfner < > etudenhoef...@apache.org> wrote: > >>>>>> > >>>>>> I believe we also wanted to get in at least the read path for > UnknownType. Fokko has a WIP PR for that. > >>>>>> > >>>>>> On Fri, Jul 25, 2025 at 6:13 PM Steven Wu <stevenz...@gmail.com> > wrote: > >>>>>>> > >>>>>>> 3. Spark: fix data frame join based on different versions of the > same table that may lead to weird results. Anton is working on a fix. It > requires a small behavior change (table state may be stale up to refresh > interval). Hence it is better to include it in the 1.10.0 release where > Spark 4.0 is first supported. > >>>>>>> 4. Variant support in core and Spark 4.0. Ryan thinks this is very > close and will prioritize the review. > >>>>>>> > >>>>>>> We still have the above two issues pending. 3 doesn't have a PR > yet. PR for 4 is not associated with the milestone yet. > >>>>>>> > >>>>>>> On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu <kevinjq...@apache.org> > wrote: > >>>>>>>> > >>>>>>>> Thanks everyone for the review. The 2 PRs are both merged. > >>>>>>>> Looks like there's only 1 PR left in the 1.10 milestone :) > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Kevin Liu > >>>>>>>> > >>>>>>>> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang < > owenzhang1...@gmail.com> wrote: > >>>>>>>>> > >>>>>>>>> Thanks Kevin. The first change is not in the versioned doc so it > can be released anytime. > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> Manu > >>>>>>>>> > >>>>>>>>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu <kevinjq...@apache.org> > wrote: > >>>>>>>>>> > >>>>>>>>>> The 3 PRs above are merged. Thanks everyone for the review. > >>>>>>>>>> > >>>>>>>>>> I've added 2 more PRs to the 1.10 milestone. These are both > nice-to-haves. > >>>>>>>>>> - docs: add subpage for REST Catalog Spec in "Specification" > #13521 > >>>>>>>>>> - REST-Fixture: Ensure strict mode on jdbc catalog for rest > fixture #13599 > >>>>>>>>>> > >>>>>>>>>> The first one changes the link for "REST Catalog Spec" on the > left nav of https://iceberg.apache.org/spec/ from the swagger.io link to > a dedicated page for IRC. > >>>>>>>>>> The second one fixes the default behavior of > `iceberg-rest-fixture` image to align with the general expectation when > creating a table in a catalog. > >>>>>>>>>> > >>>>>>>>>> Please take a look. I would like to have both of these as part > of the 1.10 release. > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Kevin Liu > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu < > kevinjq...@apache.org> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Here are the 3 PRs to add corresponding tests. > >>>>>>>>>>> https://github.com/apache/iceberg/pull/13648 > >>>>>>>>>>> https://github.com/apache/iceberg/pull/13649 > >>>>>>>>>>> https://github.com/apache/iceberg/pull/13650 > >>>>>>>>>>> > >>>>>>>>>>> I've tagged them with the 1.10 milestone, waiting for CI to > complete :) > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> Kevin Liu > >>>>>>>>>>> > >>>>>>>>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu < > stevenz...@gmail.com> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Kevin, thanks for checking that. I will take a look at your > backport PRs. Can you add them to the 1.10.0 milestone? > >>>>>>>>>>>> > >>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu < > kevinjq...@apache.org> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks again for driving this Steven! We're very close!! > >>>>>>>>>>>>> > >>>>>>>>>>>>> As mentioned in the community sync today, I wanted to verify > feature parity between Spark 3.5 and Spark 4.0 for this release. > >>>>>>>>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 have > feature parity for this upcoming release. More details in the other devlist > thread https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> Kevin Liu > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu < > stevenz...@gmail.com> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Another update on the release. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The existing blocker PRs are almost done. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> During today's community sync, we identified the following > issues/PRs to be included in the 1.10.0 release. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> backport of PR 13100 to the main branch. I have created a > cherry-pick PR for that. There is a one line difference compared to the > original PR due to the removal of the deprecated RemoveSnapshot class in > main branch for 1.10.0 target. Amogh has suggested using RemoveSnapshots > with a single snapshot id, which should be supported by all REST catalog > servers. > >>>>>>>>>>>>>> Flink compaction doesn't support row lineage. Fail the > compaction for V3 tables. I created a PR for that. Will backport after it > is merged. > >>>>>>>>>>>>>> Spark: fix data frame join based on different versions of > the same table that may lead to weird results. Anton is working on a fix. > It requires a small behavior change (table state may be stale up to refresh > interval). Hence it is better to include it in the 1.10.0 release where > Spark 4.0 is first supported. > >>>>>>>>>>>>>> Variant support in core and Spark 4.0. Ryan thinks this is > very close and will prioritize the review. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>> steven > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The 1.10.0 milestone can be found here. > >>>>>>>>>>>>>> https://github.com/apache/iceberg/milestone/54 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu < > stevenz...@gmail.com> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Ajantha/Robin, thanks for the note. We can include the PR > in the 1.10.0 milestone. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt > <ro...@confluent.io.invalid> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent point > of view, we will not be able to publish the connector on Confluent Hub > until this CVE[1] is fixed. > >>>>>>>>>>>>>>>> Since we would not publish a snapshot build, if the fix > doesn't make it into 1.10 then we'd have to wait for 1.11 (or a dot release > of 1.10) to be able to include the connector on Confluent Hub. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Thanks, Robin. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> [1] > https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861 > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat < > ajanthab...@gmail.com> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I have approached Confluent people to help us publish > the OSS Kafka Connect Iceberg sink plugin. > >>>>>>>>>>>>>>>>> It seems we have a CVE from dependency that blocks us > from publishing the plugin. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Please include the below PR for 1.10.0 release which > fixes that. > >>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/13561 > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> - Ajantha > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu < > stevenz...@gmail.com> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > Engines may model operations as deleting/inserting > rows or as modifications to rows that preserve row ids. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Manu, I agree this sentence probably lacks some > context. The first half (as deleting/inserting rows) is probably about the > row lineage handling with equality deletes, which is described in another > place. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> "Row lineage does not track lineage for rows updated > via Equality Deletes, because engines using equality deletes avoid reading > existing data before writing changes and can't provide the original row ID > for the new rows. These updates are always treated as if the existing row > was completely removed and a unique new row was added." > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang < > owenzhang1...@gmail.com> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Thanks Steven, I missed that part but the following > sentence is a bit hard to understand (maybe just me) > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Engines may model operations as deleting/inserting > rows or as modifications to rows that preserve row ids. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Can you please help to explain? > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Steven Wu <stevenz...@gmail.com>于2025年7月15日 周二04:41写道: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Manu > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> The spec already covers the row lineage carry over > (for replace) > >>>>>>>>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> "When an existing row is moved to a different data > file for any reason, writers should write _row_id and > _last_updated_sequence_number according to the following rules:" > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>>>> Steven > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu < > stevenz...@gmail.com> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> another update on the release. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 milestone > (with 25 closed PRs). Amogh is actively working on the last blocker PR. > >>>>>>>>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information on > compaction > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I will publish a release candidate after the above > blocker is merged and backported. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>>>>> Steven > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang < > owenzhang1...@gmail.com> wrote: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Hi Amogh, > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Is it defined in the table spec that "replace" > operation should carry over existing lineage info insteading of assigning > new IDs? If not, we'd better firstly define it in spec because all engines > and implementations need to follow it. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar < > 2am...@gmail.com> wrote: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> One other area I think we need to make sure works > with row lineage before release is data file compaction. At the moment, it > looks like compaction will read the records from the data files without > projecting the lineage fields. What this means is that on write of the new > compacted data files we'd be losing the lineage information. There's no > data change in a compaction but we do need to make sure the lineage info > from carried over records is materialized in the newly compacted files so > they don't get new IDs or inherit the new file sequence number. I'm working > on addressing this as well, but I'd call this out as a blocker as well. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>> Robin Moffatt > >>>>>>>>>>>>>>>> Sr. Principal Advisor, Streaming Data Technologies >