Re: Iceberg 1.10.0 release update - July 1, 2025

Ajantha Bhat Tue, 15 Jul 2025 20:03:38 -0700

I have approached Confluent people
<https://github.com/apache/iceberg/issues/10745#issuecomment-3058281281> to
help us publish the OSS Kafka Connect Iceberg sink plugin.
It seems we have a CVE from dependency that blocks us from publishing the
plugin.


Please include the below PR for 1.10.0 release which fixes that.
https://github.com/apache/iceberg/pull/13561

- Ajantha

On Tue, Jul 15, 2025 at 10:48 AM Steven Wu <stevenz...@gmail.com> wrote:

> > Engines may model operations as deleting/inserting rows or as
> modifications to rows that preserve row ids.
>
> Manu, I agree this sentence probably lacks some context. The first half (as
> deleting/inserting rows) is probably about the row lineage handling with
> equality deletes, which is described in another place.
>
> "Row lineage does not track lineage for rows updated via Equality Deletes
> <https://iceberg.apache.org/spec/#equality-delete-files>, because engines
> using equality deletes avoid reading existing data before writing changes
> and can't provide the original row ID for the new rows. These updates are
> always treated as if the existing row was completely removed and a unique
> new row was added."
>
> On Mon, Jul 14, 2025 at 5:49 PM Manu Zhang <owenzhang1...@gmail.com>
> wrote:
>
>> Thanks Steven, I missed that part but the following sentence is a bit
>> hard to understand (maybe just me)
>>
>> Engines may model operations as deleting/inserting rows or as
>> modifications to rows that preserve row ids.
>>
>> Can you please help to explain?
>>
>>
>> Steven Wu <stevenz...@gmail.com>于2025年7月15日 周二04:41写道：
>>
>>> Manu
>>>
>>> The spec already covers the row lineage carry over (for replace)
>>> https://iceberg.apache.org/spec/#row-lineage
>>>
>>> "When an existing row is moved to a different data file for any reason,
>>> writers should write _row_id and _last_updated_sequence_number according
>>> to the following rules:"
>>>
>>> Thanks,
>>> Steven
>>>
>>>
>>> On Mon, Jul 14, 2025 at 1:38 PM Steven Wu <stevenz...@gmail.com> wrote:
>>>
>>>> another update on the release.
>>>>
>>>> We have one open PR left for the 1.10.0 milestone
>>>> <https://github.com/apache/iceberg/milestone/54> (with 25 closed PRs).
>>>> Amogh is actively working on the last blocker PR.
>>>> Spark 4.0: Preserve row lineage information on compaction
>>>> <https://github.com/apache/iceberg/pull/13555>
>>>>
>>>> I will publish a release candidate after the above blocker is merged
>>>> and backported.
>>>>
>>>> Thanks,
>>>> Steven
>>>>
>>>> On Mon, Jul 7, 2025 at 11:56 PM Manu Zhang <owenzhang1...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Amogh,
>>>>>
>>>>> Is it defined in the table spec that "replace" operation should carry
>>>>> over existing lineage info insteading of assigning new IDs? If not, we'd
>>>>> better firstly define it in spec because all engines and implementations
>>>>> need to follow it.
>>>>>
>>>>> On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <2am...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> One other area I think we need to make sure works with row lineage
>>>>>> before release is data file compaction. At the moment,
>>>>>> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44>
>>>>>>  it
>>>>>> looks like compaction will read the records from the data files without
>>>>>> projecting the lineage fields. What this means is that on write of the 
>>>>>> new
>>>>>> compacted data files we'd be losing the lineage information. There's no
>>>>>> data change in a compaction but we do need to make sure the lineage info
>>>>>> from carried over records is materialized in the newly compacted files so
>>>>>> they don't get new IDs or inherit the new file sequence number. I'm 
>>>>>> working
>>>>>> on addressing this as well, but I'd call this out as a blocker as well.
>>>>>>
>>>>>

Re: Iceberg 1.10.0 release update - July 1, 2025

Reply via email to