pvary commented on PR #14435:
URL: https://github.com/apache/iceberg/pull/14435#issuecomment-3540594805

   I agree with @Guosmilesmile that for V3 tables, row lineage tracking is 
broken because the original files don’t contain row_id; they inherit them later.
   Consider this scenario:
   - Commit 1 adds File 1 (50 rows), sets first_row_id = 0, but doesn’t assign 
row_ids. The row_ids are 0, 1, 2, 3..., 49.
   - Commit 2 adds File 2 (50 rows), sets first_row_id = 50, but doesn’t assign 
row_ids. The row_ids are 50, 51, 52, 53..., 99.
   - Commit 3 adds File 3 (50 rows), sets first_row_id = 100, but doesn’t 
assign row_ids. The row_ids are 100, 101, 102, 103..., 149.
   - Commit 4 performs compaction, merging File 1 and File 2. The commit sets 
first_row_id = 150, and since the new file doesn’t contain row_ids, they are 
assigned by the current algorithm starting from 150. The row_ids are 150, 151, 
152, 153..., 249.
   
   As a result, the compacted rows receive new row_ids, which breaks lineage 
tracking.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to