laskoviymishka opened a new issue, #998: URL: https://github.com/apache/iceberg-go/issues/998
Parent: #589 The metadata side of row lineage landed in #659 / #728 / #735 / #741 (closed #727). What's missing is the write-side: iceberg-go writes data files but doesn't populate per-row `_row_id` for new appends. Per spec: the writer assigns each new row a row ID starting at the current `next-row-id`, the manifest entry's `first_row_id` is the smallest assigned ID, the manifest list aggregates, and metadata `next-row-id` advances by the total row count at commit time. Two questions worth nailing down in this thread before the code goes in: - **Where row-id assignment happens** — at the writer (per-task pre-claim of a range) or at commit time (post-process the staged files). Per-task pre-claim is simpler under sequential commits but needs care under retries. Java assigns at the writer using a pre-claimed range; mirror unless we have a reason to diverge. - **How parallel writers coordinate** — each writer claims its range at task setup; commit ordering settles the actual `first_row_id` per file. Touches: `table/internal/parquet_files.go` (writer), `table/snapshot_producers.go` (commit-time `next-row-id` advance), `table/metadata.go` (consistency check). Tests: scan after a v3 append asserts `_row_id` increments contiguously across the new files; metadata `next-row-id` advances by exactly the row count of the snapshot. Spec: [Iceberg row lineage](https://iceberg.apache.org/spec/#row-lineage). Java parity references: [apache/iceberg#15187](https://github.com/apache/iceberg/pull/15187), [#15508](https://github.com/apache/iceberg/pull/15508). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
