Thanks, Cheng!

Right now, we probably have a good lead on why the multiple staging
repositories happen. I can avoid it by running the staging-binaries.sh
script from my home network.

I have created a PR to update the release doc. Will not merge until I
verified it from my home later.
https://github.com/apache/iceberg/pull/13978



On Tue, Sep 2, 2025 at 10:38 AM Cheng Pan <pan3...@gmail.com> wrote:

> > will they be merged when releasing to Maven central?
>
> Yes, select all voted repos and click "Release" button after voting pass,
> then artifacts under those repos will go Maven Central.
>
> Thanks,
> Cheng Pan
>
>
>
> On Sep 3, 2025, at 01:31, Steven Wu <stevenz...@gmail.com> wrote:
>
> Thanks, Cheng.
>
> You are right. There were two public IPs in the two repositories.
>
>
> https://stackoverflow.com/questions/15511484/mvn-releaseperform-creates-multiple-staging-repos
> This can happen in corporate environments where a floating IP address
> proxies outbound requests.
>
> > I think it doesn’t matter, just listing all repo links in the vote
> thread is fine.
>
> Are you saying that we can just release the two staging repositories? will
> they be merged when releasing to Maven central?
>
> On Tue, Sep 2, 2025 at 10:11 AM Cheng Pan <pan3...@gmail.com> wrote:
>
>> Have you checked repository.apache.org? I remember the staging repo will
>> record the Client IP.
>>
>> It’s likely that you have multiple Public IPs in your local network and
>> the HTTP connections happen go via different IPs.
>>
>> I think it doesn’t matter, just listing all repo links in the vote thread
>> is fine.
>>
>> Thanks,
>> Cheng Pan
>>
>>
>>
>> On Sep 3, 2025, at 01:02, Steven Wu <stevenz...@gmail.com> wrote:
>>
>> sorry, the PR link for the staging-binaries.sh was wrong (missing a
>> digit).
>>
>> I thought this PR will fix the issue. Initially, it worked well with a
>> few runs. But later I am still experiencing the same problem. Suggestions
>> are appreciated!
>> https://github.com/apache/iceberg/pull/13958
>>
>> On Tue, Sep 2, 2025 at 9:51 AM Steven Wu <stevenz...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Just to update the community on the status.
>>>
>>> Fokko also reached out to include Parquet Java 1.16.0 in this release.
>>> Vote just passed in the Parquet community. We are waiting for the binary
>>> release. We will try to include it in the 1.10.0 release. Reviews are
>>> welcomed.
>>> https://github.com/apache/iceberg/pull/1394
>>>
>>> We also ran into a couple of issues with the release script/process.
>>>
>>> 1) staging-binaries.sh has race conditions on concurrent publish and 2
>>> folders in Maven repo.
>>>
>>> I thought this PR will fix the issue. Initially, it worked well with a
>>> few runs. But later I am still experiencing the same problem. Suggestions
>>> are appreciated!
>>> https://github.com/apache/iceberg/pull/13958
>>>
>>> 2) Yuya found out that the iceberg-api module wasn't published in the
>>> RC2 staging (1243).
>>> https://repository.apache.org/content/repositories/orgapacheiceberg-1243/
>>>
>>> The first release issue is the more annoying/impacting problem. the
>>> second release issue is uncommon, as I didn't see it in a few other runs of
>>> staging-binaries.sh.
>>>
>>> Thanks,
>>> Steven
>>>
>>>
>>>
>>> On Sun, Aug 31, 2025 at 12:48 PM Steven Wu <stevenz...@gmail.com> wrote:
>>>
>>>> I started a vote thread for 1.10.0 RC2.
>>>>
>>>> I have to fix a couple of release script issues. Hence the first
>>>> release candidate is RC2 to vote.
>>>>
>>>> On Fri, Aug 29, 2025 at 9:53 AM Kevin Liu <kevinjq...@apache.org>
>>>> wrote:
>>>>
>>>>> Thanks Steven! I did another pass to check for feature parity between
>>>>> spark 3.5 and spark 4.0 for this release and everything looks good. There
>>>>> are a few test cases that have not been ported, but we can punt those for
>>>>> now.
>>>>>
>>>>> Best,
>>>>> Kevin Liu
>>>>>
>>>>> On Thu, Aug 28, 2025 at 7:08 PM Steven Wu <stevenz...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks to Fokko and Ryan, the unknown type support PR was merged
>>>>>> today.
>>>>>>
>>>>>> Everything in the 1.10.0 milestone is closed now.
>>>>>>
>>>>>> I will work on a release candidate next.
>>>>>>
>>>>>> On Fri, Aug 8, 2025 at 6:14 AM Fokko Driesprong <fo...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Steven,
>>>>>>>
>>>>>>> Thanks for updating this thread.
>>>>>>>
>>>>>>> I've updated the UnknownType PR
>>>>>>> <https://github.com/apache/iceberg/pull/13445> to first block on
>>>>>>> the complex cases that will require some more discussion. This way we 
>>>>>>> can
>>>>>>> revisit this also after the 1.10.0 release.
>>>>>>>
>>>>>>> Kind regards,
>>>>>>> Fokko
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Op do 7 aug 2025 om 23:56 schreef Steven Wu <stevenz...@gmail.com>:
>>>>>>>
>>>>>>>> edited the subject line as we are into August.
>>>>>>>>
>>>>>>>> We are still waiting for the following two changes for the 1.10.0
>>>>>>>> release
>>>>>>>> * Anton's fix for the data frame join using the same snapshot,
>>>>>>>> which will introduce a slight behavior change in spark 4.0.
>>>>>>>> * unknown type support.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Aug 1, 2025 at 6:56 AM Alexandre Dutra <adu...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Steven,
>>>>>>>>>
>>>>>>>>> A small regression with S3 signing has been reported to me. The
>>>>>>>>> fix is simple:
>>>>>>>>>
>>>>>>>>> https://github.com/apache/iceberg/pull/13718
>>>>>>>>>
>>>>>>>>> Would it be still possible to have it in 1.10 please?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Alex
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jul 31, 2025 at 7:19 PM Steven Wu <stevenz...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> >
>>>>>>>>> > Currently, the 1.10.0 milestone have no open PRs
>>>>>>>>> > https://github.com/apache/iceberg/milestone/54
>>>>>>>>> >
>>>>>>>>> > The variant PR was merged this and last week. There are still
>>>>>>>>> some variant testing related PRs, which are probably not blockers for
>>>>>>>>> 1.10.0 release.
>>>>>>>>> > * Spark variant read:
>>>>>>>>> https://github.com/apache/iceberg/pull/13219
>>>>>>>>> > * use short strings:
>>>>>>>>> https://github.com/apache/iceberg/pull/13284
>>>>>>>>> >
>>>>>>>>> > We are still waiting for the following two changes
>>>>>>>>> > * Anton's fix for the data frame join using the same snapshot,
>>>>>>>>> which will introduce a slight behavior change in spark 4.0.
>>>>>>>>> > * unknown type support. Fokko raised a discussion thread on a
>>>>>>>>> blocking issue.
>>>>>>>>> >
>>>>>>>>> > Anything else did I miss?
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Sat, Jul 26, 2025 at 5:52 AM Fokko Driesprong <
>>>>>>>>> fo...@apache.org> wrote:
>>>>>>>>> >>
>>>>>>>>> >> Hey all,
>>>>>>>>> >>
>>>>>>>>> >> The read path for the UnknownType needs some community
>>>>>>>>> discussion. I've raised a separate thread. PTAL
>>>>>>>>> >>
>>>>>>>>> >> Kind regards from Belgium,
>>>>>>>>> >> Fokko
>>>>>>>>> >>
>>>>>>>>> >> Op za 26 jul 2025 om 00:58 schreef Ryan Blue <rdb...@gmail.com
>>>>>>>>> >:
>>>>>>>>> >>>
>>>>>>>>> >>> I thought that we said we wanted to get support out for v3
>>>>>>>>> features in this release unless there is some reasonable blocker, like
>>>>>>>>> Spark not having geospatial types. To me, I think that means we 
>>>>>>>>> should aim
>>>>>>>>> to get variant and unknown done so that we have a complete 
>>>>>>>>> implementation
>>>>>>>>> with a major engine. And it should not be particularly difficult to 
>>>>>>>>> get
>>>>>>>>> unknown done so I'd opt to get it in.
>>>>>>>>> >>>
>>>>>>>>> >>> On Fri, Jul 25, 2025 at 11:28 AM Steven Wu <
>>>>>>>>> stevenz...@gmail.com> wrote:
>>>>>>>>> >>>>
>>>>>>>>> >>>> > I believe we also wanted to get in at least the read path
>>>>>>>>> for UnknownType. Fokko has a WIP PR for that.
>>>>>>>>> >>>> I thought in the community sync the consensus is that this is
>>>>>>>>> not a blocker, because it is a new feature implementation. If it is 
>>>>>>>>> ready,
>>>>>>>>> it will be included.
>>>>>>>>> >>>>
>>>>>>>>> >>>> On Fri, Jul 25, 2025 at 9:43 AM Kevin Liu <
>>>>>>>>> kevinjq...@apache.org> wrote:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> I think Fokko's OOO. Should we help with that PR?
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> On Fri, Jul 25, 2025 at 9:38 AM Eduard Tudenhöfner <
>>>>>>>>> etudenhoef...@apache.org> wrote:
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> I believe we also wanted to get in at least the read path
>>>>>>>>> for UnknownType. Fokko has a WIP PR for that.
>>>>>>>>> >>>>>>
>>>>>>>>> >>>>>> On Fri, Jul 25, 2025 at 6:13 PM Steven Wu <
>>>>>>>>> stevenz...@gmail.com> wrote:
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> 3. Spark: fix data frame join based on different versions
>>>>>>>>> of the same table that may lead to weird results. Anton is working on 
>>>>>>>>> a
>>>>>>>>> fix. It requires a small behavior change (table state may be stale up 
>>>>>>>>> to
>>>>>>>>> refresh interval). Hence it is better to include it in the 1.10.0 
>>>>>>>>> release
>>>>>>>>> where Spark 4.0 is first supported.
>>>>>>>>> >>>>>>> 4. Variant support in core and Spark 4.0. Ryan thinks this
>>>>>>>>> is very close and will prioritize the review.
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> We still have the above two issues pending. 3 doesn't have
>>>>>>>>> a PR yet. PR for 4 is not associated with the milestone yet.
>>>>>>>>> >>>>>>>
>>>>>>>>> >>>>>>> On Fri, Jul 25, 2025 at 9:02 AM Kevin Liu <
>>>>>>>>> kevinjq...@apache.org> wrote:
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> Thanks everyone for the review. The 2 PRs are both merged.
>>>>>>>>> >>>>>>>> Looks like there's only 1 PR left in the 1.10 milestone :)
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> Best,
>>>>>>>>> >>>>>>>> Kevin Liu
>>>>>>>>> >>>>>>>>
>>>>>>>>> >>>>>>>> On Thu, Jul 24, 2025 at 7:44 PM Manu Zhang <
>>>>>>>>> owenzhang1...@gmail.com> wrote:
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> Thanks Kevin. The first change is not in the versioned
>>>>>>>>> doc so it can be released anytime.
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> Regards,
>>>>>>>>> >>>>>>>>> Manu
>>>>>>>>> >>>>>>>>>
>>>>>>>>> >>>>>>>>> On Fri, Jul 25, 2025 at 4:21 AM Kevin Liu <
>>>>>>>>> kevinjq...@apache.org> wrote:
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> The 3 PRs above are merged. Thanks everyone for the
>>>>>>>>> review.
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> I've added 2 more PRs to the 1.10 milestone. These are
>>>>>>>>> both nice-to-haves.
>>>>>>>>> >>>>>>>>>> - docs: add subpage for REST Catalog Spec in
>>>>>>>>> "Specification" #13521
>>>>>>>>> >>>>>>>>>> - REST-Fixture: Ensure strict mode on jdbc catalog for
>>>>>>>>> rest fixture #13599
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> The first one changes the link for "REST Catalog Spec"
>>>>>>>>> on the left nav of https://iceberg.apache.org/spec/ from the
>>>>>>>>> swagger.io link to a dedicated page for IRC.
>>>>>>>>> >>>>>>>>>> The second one fixes the default behavior of
>>>>>>>>> `iceberg-rest-fixture` image to align with the general expectation 
>>>>>>>>> when
>>>>>>>>> creating a table in a catalog.
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> Please take a look. I would like to have both of these
>>>>>>>>> as part of the 1.10 release.
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> Best,
>>>>>>>>> >>>>>>>>>> Kevin Liu
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>>
>>>>>>>>> >>>>>>>>>> On Wed, Jul 23, 2025 at 1:31 PM Kevin Liu <
>>>>>>>>> kevinjq...@apache.org> wrote:
>>>>>>>>> >>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>> Here are the 3 PRs to add corresponding tests.
>>>>>>>>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13648
>>>>>>>>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13649
>>>>>>>>> >>>>>>>>>>> https://github.com/apache/iceberg/pull/13650
>>>>>>>>> >>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>> I've tagged them with the 1.10 milestone, waiting for
>>>>>>>>> CI to complete :)
>>>>>>>>> >>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>> Best,
>>>>>>>>> >>>>>>>>>>> Kevin Liu
>>>>>>>>> >>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>> On Wed, Jul 23, 2025 at 1:08 PM Steven Wu <
>>>>>>>>> stevenz...@gmail.com> wrote:
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>> Kevin, thanks for checking that. I will take a look
>>>>>>>>> at your backport PRs. Can you add them to the 1.10.0 milestone?
>>>>>>>>> >>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:27 PM Kevin Liu <
>>>>>>>>> kevinjq...@apache.org> wrote:
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>> Thanks again for driving this Steven! We're very
>>>>>>>>> close!!
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>> As mentioned in the community sync today, I wanted
>>>>>>>>> to verify feature parity between Spark 3.5 and Spark 4.0 for this 
>>>>>>>>> release.
>>>>>>>>> >>>>>>>>>>>>> I was able to verify that Spark 3.5 and Spark 4.0
>>>>>>>>> have feature parity for this upcoming release. More details in the 
>>>>>>>>> other
>>>>>>>>> devlist thread
>>>>>>>>> https://lists.apache.org/thread/7x7xcm3y87y81c4grq4nn9gdjd4jm05f
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>> Thanks,
>>>>>>>>> >>>>>>>>>>>>> Kevin Liu
>>>>>>>>> >>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>> On Wed, Jul 23, 2025 at 12:17 PM Steven Wu <
>>>>>>>>> stevenz...@gmail.com> wrote:
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> Another update on the release.
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> The existing blocker PRs are almost done.
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> During today's community sync, we identified the
>>>>>>>>> following issues/PRs to be included in the 1.10.0 release.
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> backport of PR 13100 to the main branch. I have
>>>>>>>>> created a cherry-pick PR for that. There is a one line difference 
>>>>>>>>> compared
>>>>>>>>> to the original PR due to the removal of the deprecated RemoveSnapshot
>>>>>>>>> class in main branch for 1.10.0 target. Amogh has suggested using
>>>>>>>>> RemoveSnapshots with a single snapshot id, which should be supported 
>>>>>>>>> by all
>>>>>>>>> REST catalog servers.
>>>>>>>>> >>>>>>>>>>>>>> Flink compaction doesn't support row lineage. Fail
>>>>>>>>> the compaction for V3 tables. I created a PR for that. Will backport 
>>>>>>>>> after
>>>>>>>>> it is merged.
>>>>>>>>> >>>>>>>>>>>>>> Spark: fix data frame join based on different
>>>>>>>>> versions of the same table that may lead to weird results. Anton is 
>>>>>>>>> working
>>>>>>>>> on a fix. It requires a small behavior change (table state may be 
>>>>>>>>> stale up
>>>>>>>>> to refresh interval). Hence it is better to include it in the 1.10.0
>>>>>>>>> release where Spark 4.0 is first supported.
>>>>>>>>> >>>>>>>>>>>>>> Variant support in core and Spark 4.0. Ryan thinks
>>>>>>>>> this is very close and will prioritize the review.
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> Thanks,
>>>>>>>>> >>>>>>>>>>>>>> steven
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> The 1.10.0 milestone can be found here.
>>>>>>>>> >>>>>>>>>>>>>> https://github.com/apache/iceberg/milestone/54
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 9:15 AM Steven Wu <
>>>>>>>>> stevenz...@gmail.com> wrote:
>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>> Ajantha/Robin, thanks for the note. We can include
>>>>>>>>> the PR in the 1.10.0 milestone.
>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Jul 16, 2025 at 3:20 AM Robin Moffatt
>>>>>>>>> <ro...@confluent.io.invalid> wrote:
>>>>>>>>> >>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>> Thanks Ajantha. Just to confirm, from a Confluent
>>>>>>>>> point of view, we will not be able to publish the connector on 
>>>>>>>>> Confluent
>>>>>>>>> Hub until this CVE[1] is fixed.
>>>>>>>>> >>>>>>>>>>>>>>>> Since we would not publish a snapshot build, if
>>>>>>>>> the fix doesn't make it into 1.10 then we'd have to wait for 1.11 (or 
>>>>>>>>> a dot
>>>>>>>>> release of 1.10) to be able to include the connector on Confluent Hub.
>>>>>>>>> >>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, Robin.
>>>>>>>>> >>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>> [1]
>>>>>>>>> https://github.com/apache/iceberg/issues/10745#issuecomment-3074300861
>>>>>>>>> >>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, 16 Jul 2025 at 04:03, Ajantha Bhat <
>>>>>>>>> ajanthab...@gmail.com> wrote:
>>>>>>>>> >>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>> I have approached Confluent people to help us
>>>>>>>>> publish the OSS Kafka Connect Iceberg sink plugin.
>>>>>>>>> >>>>>>>>>>>>>>>>> It seems we have a CVE from dependency that
>>>>>>>>> blocks us from publishing the plugin.
>>>>>>>>> >>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>> Please include the below PR for 1.10.0 release
>>>>>>>>> which fixes that.
>>>>>>>>> >>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/13561
>>>>>>>>> >>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>> - Ajantha
>>>>>>>>> >>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <
>>>>>>>>> stevenz...@gmail.com> wrote:
>>>>>>>>> >>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>> > Engines may model operations as
>>>>>>>>> deleting/inserting rows or as modifications to rows that preserve row 
>>>>>>>>> ids.
>>>>>>>>> >>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>> Manu, I agree this sentence probably lacks some
>>>>>>>>> context. The first half (as deleting/inserting rows) is probably 
>>>>>>>>> about the
>>>>>>>>> row lineage handling with equality deletes, which is described in 
>>>>>>>>> another
>>>>>>>>> place.
>>>>>>>>> >>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>> "Row lineage does not track lineage for rows
>>>>>>>>> updated via Equality Deletes, because engines using equality deletes 
>>>>>>>>> avoid
>>>>>>>>> reading existing data before writing changes and can't provide the 
>>>>>>>>> original
>>>>>>>>> row ID for the new rows. These updates are always treated as if the
>>>>>>>>> existing row was completely removed and a unique new row was added."
>>>>>>>>> >>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <
>>>>>>>>> owenzhang1...@gmail.com> wrote:
>>>>>>>>> >>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks Steven, I missed that part but the
>>>>>>>>> following sentence is a bit hard to understand (maybe just me)
>>>>>>>>> >>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>> Engines may model operations as
>>>>>>>>> deleting/inserting rows or as modifications to rows that preserve row 
>>>>>>>>> ids.
>>>>>>>>> >>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>> Can you please help to explain?
>>>>>>>>> >>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>> Steven Wu <stevenz...@gmail.com>于2025年7月15日
>>>>>>>>> 周二04:41写道:
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Manu
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The spec already covers the row lineage carry
>>>>>>>>> over (for replace)
>>>>>>>>> >>>>>>>>>>>>>>>>>>>> https://iceberg.apache.org/spec/#row-lineage
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>>> "When an existing row is moved to a different
>>>>>>>>> data file for any reason, writers should write _row_id and
>>>>>>>>> _last_updated_sequence_number according to the following rules:"
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Steven
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <
>>>>>>>>> stevenz...@gmail.com> wrote:
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> another update on the release.
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> We have one open PR left for the 1.10.0
>>>>>>>>> milestone (with 25 closed PRs). Amogh is actively working on the last
>>>>>>>>> blocker PR.
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Spark 4.0: Preserve row lineage information
>>>>>>>>> on compaction
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I will publish a release candidate after the
>>>>>>>>> above blocker is merged and backported.
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Steven
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <
>>>>>>>>> owenzhang1...@gmail.com> wrote:
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hi Amogh,
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Is it defined in the table spec that
>>>>>>>>> "replace" operation should carry over existing lineage info 
>>>>>>>>> insteading of
>>>>>>>>> assigning new IDs? If not, we'd better firstly define it in spec 
>>>>>>>>> because
>>>>>>>>> all engines and implementations need to follow it.
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh
>>>>>>>>> Jahagirdar <2am...@gmail.com> wrote:
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> One other area I think we need to make
>>>>>>>>> sure works with row lineage before release is data file compaction. 
>>>>>>>>> At the
>>>>>>>>> moment, it looks like compaction will read the records from the data 
>>>>>>>>> files
>>>>>>>>> without projecting the lineage fields. What this means is that on 
>>>>>>>>> write of
>>>>>>>>> the new compacted data files we'd be losing the lineage information.
>>>>>>>>> There's no data change in a compaction but we do need to make sure the
>>>>>>>>> lineage info from carried over records is materialized in the newly
>>>>>>>>> compacted files so they don't get new IDs or inherit the new file 
>>>>>>>>> sequence
>>>>>>>>> number. I'm working on addressing this as well, but I'd call this out 
>>>>>>>>> as a
>>>>>>>>> blocker as well.
>>>>>>>>> >>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>>
>>>>>>>>> >>>>>>>>>>>>>>>> --
>>>>>>>>> >>>>>>>>>>>>>>>> Robin Moffatt
>>>>>>>>> >>>>>>>>>>>>>>>> Sr. Principal Advisor, Streaming Data Technologies
>>>>>>>>>
>>>>>>>>
>>
>

Reply via email to