I have approached Confluent people <https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281> to help us publish the OSS Kafka Connect Iceberg sink plugin. It seems we have a CVE from dependency that blocks us from publishing the plugin.
Please include the below PR for 1.10.0 release which fixes that. https://github.com/apache/iceberg/pull/13561 - Ajantha On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <stevenz...@gmail.com> wrote: > > Engines may model operations as deleting/inserting rows or as > modifications to rows that preserve row ids. > > Manu, I agree this sentence probably lacks some context. The first half (as > deleting/inserting rows) is probably about the row lineage handling with > equality deletes, which is described in another place. > > "Row lineage does not track lineage for rows updated via Equality Deletes > <https://iceberg.apache.org/spec/#equality-delete-files>, because engines > using equality deletes avoid reading existing data before writing changes > and can't provide the original row ID for the new rows. These updates are > always treated as if the existing row was completely removed and a unique > new row was added." > > On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <owenzhang1...@gmail.com> > wrote: > >> Thanks Steven, I missed that part but the following sentence is a bit >> hard to understand (maybe just me) >> >> Engines may model operations as deleting/inserting rows or as >> modifications to rows that preserve row ids. >> >> Can you please help to explain? >> >> >> Steven Wu <stevenz...@gmail.com>于2025年7月15日 周二04:41写道: >> >>> Manu >>> >>> The spec already covers the row lineage carry over (for replace) >>> https://iceberg.apache.org/spec/#row-lineage >>> >>> "When an existing row is moved to a different data file for any reason, >>> writers should write _row_id and _last_updated_sequence_number according >>> to the following rules:" >>> >>> Thanks, >>> Steven >>> >>> >>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <stevenz...@gmail.com> wrote: >>> >>>> another update on the release. >>>> >>>> We have one open PR left for the 1.10.0 milestone >>>> <https://github.com/apache/iceberg/milestone/54> (with 25 closed PRs). >>>> Amogh is actively working on the last blocker PR. >>>> Spark 4.0: Preserve row lineage information on compaction >>>> <https://github.com/apache/iceberg/pull/13555> >>>> >>>> I will publish a release candidate after the above blocker is merged >>>> and backported. >>>> >>>> Thanks, >>>> Steven >>>> >>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <owenzhang1...@gmail.com> >>>> wrote: >>>> >>>>> Hi Amogh, >>>>> >>>>> Is it defined in the table spec that "replace" operation should carry >>>>> over existing lineage info insteading of assigning new IDs? If not, we'd >>>>> better firstly define it in spec because all engines and implementations >>>>> need to follow it. >>>>> >>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <2am...@gmail.com> >>>>> wrote: >>>>> >>>>>> One other area I think we need to make sure works with row lineage >>>>>> before release is data file compaction. At the moment, >>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44> >>>>>> it >>>>>> looks like compaction will read the records from the data files without >>>>>> projecting the lineage fields. What this means is that on write of the >>>>>> new >>>>>> compacted data files we'd be losing the lineage information. There's no >>>>>> data change in a compaction but we do need to make sure the lineage info >>>>>> from carried over records is materialized in the newly compacted files so >>>>>> they don't get new IDs or inherit the new file sequence number. I'm >>>>>> working >>>>>> on addressing this as well, but I'd call this out as a blocker as well. >>>>>> >>>>>