SamWheating opened a new issue, #14953:
URL: https://github.com/apache/iceberg/issues/14953
### Apache Iceberg version
1.10.1 (latest release)
### Query engine
Spark
### Please describe the bug 🐞
I have been experimenting with various write-audit-publish workflows and I
noticed this potentially hazardous behaviour when using `spark.wap.id` to stage
changes.
If multiple snapshots are created with the same `wap.id`, then the
`publish_changes` procedure will only cherry-pick the earliest matching
snapshot, silently ignoring any other staged changes.
Reproduction, using the spark-sql quickstart:
```sql
CREATE TABLE demo.default.wap_example
(country string, population bigint)
USING ICEBERG
PARTITIONED BY (country)
TBLPROPERTIES ('write.wap.enabled'='true');
SET spark.wap.id=wap_with_two_snapshots;
-- write two rows into two partitions (will create two snapshots with the
same wap.id)
INSERT INTO demo.default.wap_example VALUES ('Canada', 40000000);
INSERT INTO demo.default.wap_example VALUES ('USA', 340000000);
CALL demo.system.publish_changes('demo.default.wap_example',
'wap_with_two_snapshots');
SELECT * from demo.default.wap_example;
-- Canada 40000000
-- Time taken: 0.052 seconds, Fetched 1 row(s)
```
This makes sense looking at the procedure's definition
[here](https://github.com/apache/iceberg/blob/00046005889c66ffc860bae012d7fc560e8f040a/spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/procedures/PublishChangesProcedure.java#L102-L105),
but I think that this is potentially harmful as it can result in write being
silently lost.
Based on the [validation performed when cherry-picking
snapshots](https://github.com/apache/iceberg/blob/00046005889c66ffc860bae012d7fc560e8f040a/core/src/main/java/org/apache/iceberg/util/WapUtil.java#L42-L60),
it looks like its expected that `wap.id` will be unique among snapshots. In
this case I think we should raise an error during the `publish_changes`
procedure if there are multiple matching snapshots to prevent any ambiguity.
I'd be happy to work on a fix, but I want to ensure that my understanding is
correct here.
### Willingness to contribute
- [x] I can contribute a fix for this bug independently
- [ ] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]