Thanks, Cheng! Right now, we probably have a good lead on why the multiple staging repositories happen. I can avoid it by running the staging-binaries.sh script from my home network.
I have created a PR to update the release doc. Will not merge until I verified it from my home later. https://github.com/apache/iceberg/pull/13978 On Tue, Sep 2, 2025 at 10:38 AM Cheng Pan <pan3...@gmail.com> wrote: > > will they be merged when releasing to Maven central? > > Yes, select all voted repos and click "Release" button after voting pass, > then artifacts under those repos will go Maven Central. > > Thanks, > Cheng Pan > > > > On Sep 3, 2025, at 01:31, Steven Wu <stevenz...@gmail.com> wrote: > > Thanks, Cheng. > > You are right. There were two public IPs in the two repositories. > > > https://stackoverflow.com/questions/15511484/mvn-releaseperform-creates-multiple-staging-repos > This can happen in corporate environments where a floating IP address > proxies outbound requests. > > > I think it doesn’t matter, just listing all repo links in the vote > thread is fine. > > Are you saying that we can just release the two staging repositories? will > they be merged when releasing to Maven central? > > On Tue, Sep 2, 2025 at 10:11 AM Cheng Pan <pan3...@gmail.com> wrote: > >> Have you checked repository.apache.org? I remember the staging repo will >> record the Client IP. >> >> It’s likely that you have multiple Public IPs in your local network and >> the HTTP connections happen go via different IPs. >> >> I think it doesn’t matter, just listing all repo links in the vote thread >> is fine. >> >> Thanks, >> Cheng Pan >> >> >> >> On Sep 3, 2025, at 01:02, Steven Wu <stevenz...@gmail.com> wrote: >> >> sorry, the PR link for the staging-binaries.sh was wrong (missing a >> digit). >> >> I thought this PR will fix the issue. Initially, it worked well with a >> few runs. But later I am still experiencing the same problem. Suggestions >> are appreciated! >> https://github.com/apache/iceberg/pull/13958 >> >> On Tue, Sep 2, 2025 at 9:51 AM Steven Wu <stevenz...@gmail.com> wrote: >> >>> Hi, >>> >>> Just to update the community on the status. >>> >>> Fokko also reached out to include Parquet Java 1.16.0 in this release. >>> Vote just passed in the Parquet community. We are waiting for the binary >>> release. We will try to include it in the 1.10.0 release. Reviews are >>> welcomed. >>> https://github.com/apache/iceberg/pull/1394 >>> >>> We also ran into a couple of issues with the release script/process. >>> >>> 1) staging-binaries.sh has race conditions on concurrent publish and 2 >>> folders in Maven repo. >>> >>> I thought this PR will fix the issue. Initially, it worked well with a >>> few runs. But later I am still experiencing the same problem. Suggestions >>> are appreciated! >>> https://github.com/apache/iceberg/pull/13958 >>> >>> 2) Yuya found out that the iceberg-api module wasn't published in the >>> RC2 staging (1243). >>> https://repository.apache.org/content/repositories/orgapacheiceberg-1243/ >>> >>> The first release issue is the more annoying/impacting problem. the >>> second release issue is uncommon, as I didn't see it in a few other runs of >>> staging-binaries.sh. >>> >>> Thanks, >>> Steven >>> >>> >>> >>> On Sun, Aug 31, 2025 at 12:48 PM Steven Wu <stevenz...@gmail.com> wrote: >>> >>>> I started a vote thread for 1.10.0 RC2. >>>> >>>> I have to fix a couple of release script issues. Hence the first >>>> release candidate is RC2 to vote. >>>> >>>> On Fri, Aug 29, 2025 at 9:53 AM Kevin Liu <kevinjq...@apache.org> >>>> wrote: >>>> >>>>> Thanks Steven! I did another pass to check for feature parity between >>>>> spark 3.5 and spark 4.0 for this release and everything looks good. There >>>>> are a few test cases that have not been ported, but we can punt those for >>>>> now. >>>>> >>>>> Best, >>>>> Kevin Liu >>>>> >>>>> On Thu, Aug 28, 2025 at 7:08 PM Steven Wu <stevenz...@gmail.com> >>>>> wrote: >>>>> >>>>>> Thanks to Fokko and Ryan, the unknown type support PR was merged >>>>>> today. >>>>>> >>>>>> Everything in the 1.10.0 milestone is closed now. >>>>>> >>>>>> I will work on a release candidate next. >>>>>> >>>>>> On Fri, Aug 8, 2025 at 6:14 AM Fokko Driesprong <fo...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Hi Steven, >>>>>>> >>>>>>> Thanks for updating this thread. >>>>>>> >>>>>>> I've updated the UnknownType PR >>>>>>> <https://github.com/apache/iceberg/pull/13445> to first block on >>>>>>> the complex cases that will require some more discussion. This way we >>>>>>> can >>>>>>> revisit this also after the 1.10.0 release. >>>>>>> >>>>>>> Kind regards, >>>>>>> Fokko >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Op do 7 aug 2025 om 23:56 schreef Steven Wu <stevenz...@gmail.com>: >>>>>>> >>>>>>>> edited the subject line as we are into August. >>>>>>>> >>>>>>>> We are still waiting for the following two changes for the 1.10.0 >>>>>>>> release >>>>>>>> * Anton's fix for the data frame join using the same snapshot, >>>>>>>> which will introduce a slight behavior change in spark 4.0. >>>>>>>> * unknown type support. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Aug 1, 2025 at 6:56 AM Alexandre Dutra <adu...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Steven, >>>>>>>>> >>>>>>>>> A small regression with S3 signing has been reported to me. The >>>>>>>>> fix is simple: >>>>>>>>> >>>>>>>>> https://github.com/apache/iceberg/pull/13718 >>>>>>>>> >>>>>>>>> Would it be still possible to have it in 1.10 please? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Alex >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jul 31, 2025 at 7:19 PM Steven Wu <stevenz...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> > >>>>>>>>> > Currently, the 1.10.0 milestone have no open PRs >>>>>>>>> > https://github.com/apache/iceberg/milestone/54 >>>>>>>>> > >>>>>>>>> > The variant PR was merged this and last week. There are still >>>>>>>>> some variant testing related PRs, which are probably not blockers for >>>>>>>>> 1.10.0 release. >>>>>>>>> > * Spark variant read: >>>>>>>>> https://github.com/apache/iceberg/pull/13219 >>>>>>>>> > * use short strings: >>>>>>>>> https://github.com/apache/iceberg/pull/13284 >>>>>>>>> > >>>>>>>>> > We are still waiting for the following two changes >>>>>>>>> > * Anton's fix for the data frame join using the same snapshot, >>>>>>>>> which will introduce a slight behavior change in spark 4.0. >>>>>>>>> > * unknown type support. Fokko raised a discussion thread on a >>>>>>>>> blocking issue. >>>>>>>>> > >>>>>>>>> > Anything else did I miss? >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > On Sat, Jul 26, 2025 at 5:52 AM Fokko Driesprong < >>>>>>>>> fo...@apache.org> wrote: >>>>>>>>> >> >>>>>>>>> >> Hey all, >>>>>>>>> >> >>>>>>>>> >> The read path for the UnknownType needs some community >>>>>>>>> discussion. I've raised a separate thread. PTAL >>>>>>>>> >> >>>>>>>>> >> Kind regards from Belgium, >>>>>>>>> >> Fokko >>>>>>>>> >> >>>>>>>>> >> Op za 26 jul 2025 om 00:58 schreef Ryan Blue <rdb...@gmail.com >>>>>>>>> >: >>>>>>>>> >>> >>>>>>>>> >>> I thought that we said we wanted to get support out for v3 >>>>>>>>> features in this release unless there is some reasonable blocker, like >>>>>>>>> Spark not having geospatial types. To me, I think that means we >>>>>>>>> should aim >>>>>>>>> to get variant and unknown done so that we have a complete >>>>>>>>> implementation >>>>>>>>> with a major engine. And it should not be particularly difficult to >>>>>>>>> get >>>>>>>>> unknown done so I'd opt to get it in. >>>>>>>>> >>> >>>>>>>>> >>> On Fri, Jul 25, 2025 at 11:28 AM Steven Wu < >>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>> >>>> >>>>>>>>> >>>> > I believe we also wanted to get in at least the read path >>>>>>>>> for UnknownType. Fokko has a WIP PR for that. >>>>>>>>> >>>> I thought in the community sync the consensus is that this is >>>>>>>>> not a blocker, because it is a new feature implementation. If it is >>>>>>>>> ready, >>>>>>>>> it will be included. >>>>>>>>> >>>> >>>>>>>>> >>>> On Fri, Jul 25, 2025 at 9:43 AM Kevin Liu < >>>>>>>>> kevinjq...@apache.org> wrote: >>>>>>>>> >>>>> >>>>>>>>> >>>>> I think Fokko's OOO. Should we help with that PR? >>>>>>>>> >>>>> >>>>>>>>> >>>>> On Fri, Jul 25, 2025 at 9:38 AM Eduard Tudenhöfner < >>>>>>>>> etudenhoef...@apache.org> wrote: >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> I believe we also wanted to get in at least the read path >>>>>>>>> for UnknownType. Fokko has a WIP PR for that. >>>>>>>>> >>>>>> >>>>>>>>> >>>>>> On Fri, Jul 25, 2025 at 6:13 PM Steven Wu < >>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> 3. Spark: fix data frame join based on different versions >>>>>>>>> of the same table that may lead to weird results. Anton is working on >>>>>>>>> a >>>>>>>>> fix. It requires a small behavior change (table state may be stale up >>>>>>>>> to >>>>>>>>> refresh interval). Hence it is better to include it in the 1.10.0 >>>>>>>>> release >>>>>>>>> where Spark 4.0 is first supported. >>>>>>>>> >>>>>>> 4. Variant support in core and Spark 4.0. Ryan thinks this >>>>>>>>> is very close and will prioritize the review. >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> We still have the above two issues pending. 3 doesn't have >>>>>>>>> a PR yet. PR for 4 is not associated with the milestone yet. >>>>>>>>> >>>>>>> >>>>>>>>> >>>>>>> On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu < >>>>>>>>> kevinjq...@apache.org> wrote: >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> Thanks everyone for the review. The 2 PRs are both merged. >>>>>>>>> >>>>>>>> Looks like there's only 1 PR left in the 1.10 milestone :) >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> Best, >>>>>>>>> >>>>>>>> Kevin Liu >>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang < >>>>>>>>> owenzhang1...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks Kevin. The first change is not in the versioned >>>>>>>>> doc so it can be released anytime. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> Manu >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu < >>>>>>>>> kevinjq...@apache.org> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>>> The 3 PRs above are merged. Thanks everyone for the >>>>>>>>> review. >>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>>> I've added 2 more PRs to the 1.10 milestone. These are >>>>>>>>> both nice-to-haves. >>>>>>>>> >>>>>>>>>> - docs: add subpage for REST Catalog Spec in >>>>>>>>> "Specification" #13521 >>>>>>>>> >>>>>>>>>> - REST-Fixture: Ensure strict mode on jdbc catalog for >>>>>>>>> rest fixture #13599 >>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>>> The first one changes the link for "REST Catalog Spec" >>>>>>>>> on the left nav of https://iceberg.apache.org/spec/ from the >>>>>>>>> swagger.io link to a dedicated page for IRC. >>>>>>>>> >>>>>>>>>> The second one fixes the default behavior of >>>>>>>>> `iceberg-rest-fixture` image to align with the general expectation >>>>>>>>> when >>>>>>>>> creating a table in a catalog. >>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>>> Please take a look. I would like to have both of these >>>>>>>>> as part of the 1.10 release. >>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>>> Best, >>>>>>>>> >>>>>>>>>> Kevin Liu >>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu < >>>>>>>>> kevinjq...@apache.org> wrote: >>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> Here are the 3 PRs to add corresponding tests. >>>>>>>>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13648 >>>>>>>>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13649 >>>>>>>>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13650 >>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> I've tagged them with the 1.10 milestone, waiting for >>>>>>>>> CI to complete :) >>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>> >>>>>>>>>>> Kevin Liu >>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu < >>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> Kevin, thanks for checking that. I will take a look >>>>>>>>> at your backport PRs. Can you add them to the 1.10.0 milestone? >>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu < >>>>>>>>> kevinjq...@apache.org> wrote: >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> Thanks again for driving this Steven! We're very >>>>>>>>> close!! >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> As mentioned in the community sync today, I wanted >>>>>>>>> to verify feature parity between Spark 3.5 and Spark 4.0 for this >>>>>>>>> release. >>>>>>>>> >>>>>>>>>>>>> I was able to verify that Spark 3.5 and Spark 4.0 >>>>>>>>> have feature parity for this upcoming release. More details in the >>>>>>>>> other >>>>>>>>> devlist thread >>>>>>>>> https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>>>>>> Kevin Liu >>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu < >>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> Another update on the release. >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> The existing blocker PRs are almost done. >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> During today's community sync, we identified the >>>>>>>>> following issues/PRs to be included in the 1.10.0 release. >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> backport of PR 13100 to the main branch. I have >>>>>>>>> created a cherry-pick PR for that. There is a one line difference >>>>>>>>> compared >>>>>>>>> to the original PR due to the removal of the deprecated RemoveSnapshot >>>>>>>>> class in main branch for 1.10.0 target. Amogh has suggested using >>>>>>>>> RemoveSnapshots with a single snapshot id, which should be supported >>>>>>>>> by all >>>>>>>>> REST catalog servers. >>>>>>>>> >>>>>>>>>>>>>> Flink compaction doesn't support row lineage. Fail >>>>>>>>> the compaction for V3 tables. I created a PR for that. Will backport >>>>>>>>> after >>>>>>>>> it is merged. >>>>>>>>> >>>>>>>>>>>>>> Spark: fix data frame join based on different >>>>>>>>> versions of the same table that may lead to weird results. Anton is >>>>>>>>> working >>>>>>>>> on a fix. It requires a small behavior change (table state may be >>>>>>>>> stale up >>>>>>>>> to refresh interval). Hence it is better to include it in the 1.10.0 >>>>>>>>> release where Spark 4.0 is first supported. >>>>>>>>> >>>>>>>>>>>>>> Variant support in core and Spark 4.0. Ryan thinks >>>>>>>>> this is very close and will prioritize the review. >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>>>>>>> steven >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> The 1.10.0 milestone can be found here. >>>>>>>>> >>>>>>>>>>>>>> https://github.com/apache/iceberg/milestone/54 >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu < >>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> Ajantha/Robin, thanks for the note. We can include >>>>>>>>> the PR in the 1.10.0 milestone. >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt >>>>>>>>> <ro...@confluent.io.invalid> wrote: >>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent >>>>>>>>> point of view, we will not be able to publish the connector on >>>>>>>>> Confluent >>>>>>>>> Hub until this CVE[1] is fixed. >>>>>>>>> >>>>>>>>>>>>>>>> Since we would not publish a snapshot build, if >>>>>>>>> the fix doesn't make it into 1.10 then we'd have to wait for 1.11 (or >>>>>>>>> a dot >>>>>>>>> release of 1.10) to be able to include the connector on Confluent Hub. >>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>> Thanks, Robin. >>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>> [1] >>>>>>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861 >>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat < >>>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>> I have approached Confluent people to help us >>>>>>>>> publish the OSS Kafka Connect Iceberg sink plugin. >>>>>>>>> >>>>>>>>>>>>>>>>> It seems we have a CVE from dependency that >>>>>>>>> blocks us from publishing the plugin. >>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>> Please include the below PR for 1.10.0 release >>>>>>>>> which fixes that. >>>>>>>>> >>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/13561 >>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>> - Ajantha >>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu < >>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>> > Engines may model operations as >>>>>>>>> deleting/inserting rows or as modifications to rows that preserve row >>>>>>>>> ids. >>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>> Manu, I agree this sentence probably lacks some >>>>>>>>> context. The first half (as deleting/inserting rows) is probably >>>>>>>>> about the >>>>>>>>> row lineage handling with equality deletes, which is described in >>>>>>>>> another >>>>>>>>> place. >>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>> "Row lineage does not track lineage for rows >>>>>>>>> updated via Equality Deletes, because engines using equality deletes >>>>>>>>> avoid >>>>>>>>> reading existing data before writing changes and can't provide the >>>>>>>>> original >>>>>>>>> row ID for the new rows. These updates are always treated as if the >>>>>>>>> existing row was completely removed and a unique new row was added." >>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang < >>>>>>>>> owenzhang1...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks Steven, I missed that part but the >>>>>>>>> following sentence is a bit hard to understand (maybe just me) >>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>> Engines may model operations as >>>>>>>>> deleting/inserting rows or as modifications to rows that preserve row >>>>>>>>> ids. >>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>> Can you please help to explain? >>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>> Steven Wu <stevenz...@gmail.com>于2025年7月15日 >>>>>>>>> 周二04:41写道: >>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>> Manu >>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>> The spec already covers the row lineage carry >>>>>>>>> over (for replace) >>>>>>>>> >>>>>>>>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage >>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>> "When an existing row is moved to a different >>>>>>>>> data file for any reason, writers should write _row_id and >>>>>>>>> _last_updated_sequence_number according to the following rules:" >>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>>>>>>>>>>>>> Steven >>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu < >>>>>>>>> stevenz...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> another update on the release. >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> We have one open PR left for the 1.10.0 >>>>>>>>> milestone (with 25 closed PRs). Amogh is actively working on the last >>>>>>>>> blocker PR. >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information >>>>>>>>> on compaction >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I will publish a release candidate after the >>>>>>>>> above blocker is merged and backported. >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Steven >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang < >>>>>>>>> owenzhang1...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hi Amogh, >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Is it defined in the table spec that >>>>>>>>> "replace" operation should carry over existing lineage info >>>>>>>>> insteading of >>>>>>>>> assigning new IDs? If not, we'd better firstly define it in spec >>>>>>>>> because >>>>>>>>> all engines and implementations need to follow it. >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh >>>>>>>>> Jahagirdar <2am...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> One other area I think we need to make >>>>>>>>> sure works with row lineage before release is data file compaction. >>>>>>>>> At the >>>>>>>>> moment, it looks like compaction will read the records from the data >>>>>>>>> files >>>>>>>>> without projecting the lineage fields. What this means is that on >>>>>>>>> write of >>>>>>>>> the new compacted data files we'd be losing the lineage information. >>>>>>>>> There's no data change in a compaction but we do need to make sure the >>>>>>>>> lineage info from carried over records is materialized in the newly >>>>>>>>> compacted files so they don't get new IDs or inherit the new file >>>>>>>>> sequence >>>>>>>>> number. I'm working on addressing this as well, but I'd call this out >>>>>>>>> as a >>>>>>>>> blocker as well. >>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>> >>>>>>>>>>>>>>>> Robin Moffatt >>>>>>>>> >>>>>>>>>>>>>>>> Sr. Principal Advisor, Streaming Data Technologies >>>>>>>>> >>>>>>>> >> >