laskoviymishka opened a new issue, #998:
URL: https://github.com/apache/iceberg-go/issues/998

   Parent: #589
   
   The metadata side of row lineage landed in #659 / #728 / #735 / #741 (closed 
#727). What's missing is the write-side: iceberg-go writes data files but 
doesn't populate per-row `_row_id` for new appends.
   
   Per spec: the writer assigns each new row a row ID starting at the current 
`next-row-id`, the manifest entry's `first_row_id` is the smallest assigned ID, 
the manifest list aggregates, and metadata `next-row-id` advances by the total 
row count at commit time.
   
   Two questions worth nailing down in this thread before the code goes in:
   
   - **Where row-id assignment happens** — at the writer (per-task pre-claim of 
a range) or at commit time (post-process the staged files). Per-task pre-claim 
is simpler under sequential commits but needs care under retries. Java assigns 
at the writer using a pre-claimed range; mirror unless we have a reason to 
diverge.
   - **How parallel writers coordinate** — each writer claims its range at task 
setup; commit ordering settles the actual `first_row_id` per file.
   
   Touches: `table/internal/parquet_files.go` (writer), 
`table/snapshot_producers.go` (commit-time `next-row-id` advance), 
`table/metadata.go` (consistency check).
   
   Tests: scan after a v3 append asserts `_row_id` increments contiguously 
across the new files; metadata `next-row-id` advances by exactly the row count 
of the snapshot. Spec: [Iceberg row 
lineage](https://iceberg.apache.org/spec/#row-lineage). Java parity references: 
[apache/iceberg#15187](https://github.com/apache/iceberg/pull/15187), 
[#15508](https://github.com/apache/iceberg/pull/15508).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to