Hi Amogh,

Is it defined in the table spec that "replace" operation should carry over
existing lineage info insteading of assigning new IDs? If not, we'd better
firstly define it in spec because all engines and implementations need to
follow it.

On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <2am...@gmail.com> wrote:

> One other area I think we need to make sure works with row lineage before
> release is data file compaction. At the moment,
> <https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackFileRewriteRunner.java#L44>
>  it
> looks like compaction will read the records from the data files without
> projecting the lineage fields. What this means is that on write of the new
> compacted data files we'd be losing the lineage information. There's no
> data change in a compaction but we do need to make sure the lineage info
> from carried over records is materialized in the newly compacted files so
> they don't get new IDs or inherit the new file sequence number. I'm working
> on addressing this as well, but I'd call this out as a blocker as well.
>

Reply via email to