laskoviymishka commented on issue #999: URL: https://github.com/apache/iceberg-go/issues/999#issuecomment-4471781003
I’d fold PR (2) into PR (1). A standalone PR for one test plus a comment on an unchanged line feels a little too small, it creates reviewer context-switch without much benefit. You’ll need that test anyway to prove the CoW rewrite is correct end-to-end, so I’d include it there: add the regression test in PR (1), and leave a short comment near `snapshot_producers.go:920` explaining that the overcount is intentional. So I’d make it two PRs total: 1. CoW rewrite + accounting regression test/comment 2. MoR insert/reinsert API Overall, the approach looks right to me. On Q1: agreed, `WithPreserveRowLineage()` keeping the snapshot producer unaware seems like the right shape, and it reuses `synthesizeRowLineageColumns` cleanly. On Q2: good catch. Preserved IDs are always below `firstRowID`, so advancing past the full manifest count can’t collide; it just leaves a hole. Java does the same thing, so this is fine. On Q3: insert/reinsert matching `SparkPositionDeltaWrite` makes sense. Nulling `_last_updated_sequence_number` on reinsert is also correct: the new file’s `data_sequence_number` will be the new snapshot’s seq#, so inheritance lands in the right place. One small sanity check before starting PR (1): can you confirm that `synthesizeRowLineageColumns` reads the file’s sequence number on the synthesis path, rather than anything cached from the source snapshot? If yes, then nulling on reinsert should behave as intended. Let's 🚢 it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
