laskoviymishka opened a new pull request, #789:
URL: https://github.com/apache/iceberg-go/pull/789

   Adds `Transaction.NewRowDelta()` — Go equivalent of Java's `BaseRowDelta`. 
Commits data files and delete files (position or equality) in one atomic 
snapshot. This is needed for row-level mutations: an UPDATE becomes an equality 
delete for the old row + append of the new row, both in one commit.
   
   Resolves #602.
   
   ## API
   
   ```go
   rd := tx.NewRowDelta(snapshotProps)
   rd.AddRows(dataFile1, dataFile2)
   rd.AddDeletes(posDeleteFile, eqDeleteFile)
   rd.Commit(ctx)
   ```
   
   Operation type picked automatically: data-only → `append`, deletes-only → 
`delete`, both → `overwrite`.
   
   ## Validation
   
   - Delete files require format version >= 2
   - Equality deletes must have non-empty `EqualityFieldIDs` referencing 
existing schema columns
   - Content types checked: no data files in `AddDeletes`, no delete files in 
`AddRows`
   
   ## Known limitations
   
   - No conflict detection for concurrent writers — documented in the type 
comment
   - Uses fast-append producer (no manifest merging)
   
   ## What's tested
   
   The interesting ones:
   
   - Commit data + position deletes, check snapshot summary has 
`added-data-files=1`, `added-delete-files=1`, operation is `overwrite`
   - Commit equality deletes, check `added-equality-delete-files` shows up in 
summary
   - Read back manifests after commit, verify there's one data manifest and one 
delete manifest with correct content types in entries
   - Two RowDeltas on same transaction (batch1 append, batch2 append+delete), 
verify cumulative `total-data-files`
   - v1 table rejects delete files with clear error
   - Equality delete file without field IDs → error
   - Equality delete file with field ID 999 (not in schema) → error
   
   The round-trip integration test:
   1. Write 5 rows as real Parquet, append to table
   2. Write a position delete file targeting positions 1 and 3, commit via 
RowDelta
   3. Scan the table back — get 3 rows, verify IDs are `[1, 3, 5]` (beta and 
delta gone)
   
   This covers the full path: write parquet → RowDelta commit → scan with 
position delete filtering applied.
   
   ## What's left to do
   
   This PR covers the commit API. Remaining work for full DML support:
   
   - **Equality delete file writing** — a writer that produces Parquet files 
with PK-only schema and `EntryContentEqDeletes` content type. The RowDelta API 
already accepts them, but there's no convenient writer yet.
   - **Equality delete reading** — the scanner currently errors with 
"iceberg-go does not yet support equality deletes" (`scanner.go:415`). Needs: 
collect eq delete entries during scan planning, match to data files by 
partition + sequence number, apply hash-based anti-join during Arrow reads.
   - **Conflict validation** — `validateFromSnapshot`, 
`validateNoConflictingDataFiles`, etc. Java's Flink connector skips most of 
this for streaming, so it's not blocking for CDC use cases.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to