stevenzwu opened a new pull request, #16245:
URL: https://github.com/apache/iceberg/pull/16245

   ## Summary
   
   Backport of #15512 to Spark v3.4, v3.5, and v4.0.
   
   When WAP (Write-Audit-Publish) is enabled via `spark.wap.branch`,
   `canDeleteWhere()` and `deleteWhere()` could target different branches:
   
   - `canDeleteWhere()` scanned the table identifier branch (null → main),
     because the WAP branch is only a session config and not part of the
     identifier.
   - `deleteWhere()` resolved the WAP branch before committing.
   
   This could cause `canDeleteWhere()` to incorrectly approve a
   metadata-only delete based on data that was never on the WAP branch,
   surfacing at commit time as:
   
   ```
   ValidationException: Cannot delete file where some, but not all, rows match 
filter
   ```
   
   ## Fix
   
   `canDeleteWhere()` now resolves the scan branch the same way
   `deleteWhere()` resolves the write branch, with one important
   difference: it falls back to main when WAP is configured but the WAP
   branch has not been created yet, since this is a read scan.
   
   The resolved branch is threaded through `canDeleteUsingMetadata` for
   both the `TableScan.useRef` call and the `SnapshotUtil.schemaFor`
   lookup used by the metrics evaluator.
   
   The v4.1 fix used the new `SparkTableUtil.determineReadBranch` helper
   that does not exist in older versions, so the equivalent logic is
   inlined as a small private helper here. Behavior matches the v4.1 fix
   for the scenarios these older versions support (older versions do not
   have option-based read/write branches on `SparkTable`).
   
   ## Test plan
   - [x] `compileJava` and `compileTestJava` pass for v3.4, v3.5, v4.0
   - [ ] CI runs the new tests in each version's `spark-extensions` module:
     - `TestDelete#testDeleteToWapBranchCanDeleteWhereScansWapBranch`
     - `TestDelete#testMetadataDeleteToWapBranchCommitsToWapBranch`
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to