yingjianwu98 opened a new pull request, #15512:
URL: https://github.com/apache/iceberg/pull/15512
**Problem**
When WAP (Write-Audit-Publish) is enabled via `spark.wap.branch`,
canDeleteWhere() and deleteWhere() scan different branches:
- canDeleteWhere() scans using this.branch (null → main) because the WAP
branch is only a session config, not part of the table identifier
- deleteWhere() resolves the WAP branch before committing
This causes canDeleteWhere() to incorrectly return true (metadata-only
delete is possible) based on main's data, while deleteWhere() commits to the
WAP branch where the file has partial matches, resulting in:
ValidationException: Cannot delete file where some, but not all, rows
match filter
The fix adds WAP branch resolution in canDeleteWhere() — matching what
deleteWhere() already does — so both methods operate on the same branch.
Example
-- WAP enabled, spark.wap.branch = dev1
INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c'); -- goes to dev1, main
is empty
DELETE FROM t WHERE id = 1;
-- canDeleteWhere scans main (empty) → true → metadata delete
-- deleteWhere commits to dev1 → partial match → ValidationException
**Further consideration for refactoring for the delete path:**
Instead of changing the branch class variable reference, which is the case
in current code for `deleteWhere()` , should we just have a local variable that
get the result from `SparkTableUtil.determineWriteBranch`?
Will work on the backport to other spark versions later once get consensus
from the community.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]