SamWheating commented on code in PR #14955:
URL: https://github.com/apache/iceberg/pull/14955#discussion_r2658615907


##########
spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/procedures/PublishChangesProcedure.java:
##########
@@ -97,23 +97,26 @@ public Iterator<Scan> call(InternalRow args) {
     return modifyIcebergTable(
         tableIdent,
         table -> {
-          Optional<Snapshot> wapSnapshot =
-              Optional.ofNullable(
-                  Iterables.find(
-                      table.snapshots(),
-                      snapshot -> wapId.equals(WapUtil.stagedWapId(snapshot)),
-                      null));
-          if (!wapSnapshot.isPresent()) {
-            throw new ValidationException("Cannot apply unknown WAP ID '%s'", 
wapId);
+          Iterable<Snapshot> wapSnapshots =
+              Iterables.filter(
+                  table.snapshots(), snapshot -> 
wapId.equals(WapUtil.stagedWapId(snapshot)));
+
+          int numMatchingSnapshots = Iterables.size(wapSnapshots);
+
+          switch (numMatchingSnapshots) {
+            case 0:
+              throw new ValidationException("Cannot apply unknown WAP ID 
'%s'", wapId);
+            case 1:
+              long wapSnapshotId = 
Iterables.getOnlyElement(wapSnapshots).snapshotId();
+              table.manageSnapshots().cherrypick(wapSnapshotId).commit();
+              Snapshot currentSnapshot = table.currentSnapshot();
+              InternalRow outputRow = newInternalRow(wapSnapshotId, 
currentSnapshot.snapshotId());
+              return asScanIterator(OUTPUT_TYPE, outputRow);
+            default:
+              throw new ValidationException(
+                  "Cannot apply non-unique WAP ID. Found %d snapshots with WAP 
ID '%s'",
+                  numMatchingSnapshots, wapId);

Review Comment:
   I don't have a strong opinion here, but this might be considered a more 
significant / potentially breaking change? Technically having a duplicate WAP 
ID doesn't cause any problems until they are cherry-picked into main.
   
   Do you think there might be legitimate uses for staging multiple changes 
under the same WAP ID? For example:
    - staging multiple changes, evaluating all of them separately and then 
deleting all but one before committing. 
    - creating WAP snapshots which are never intended to be published (for 
testing / evaluation / etc)
   
   I am not super familiar with the original designs behind WAP in iceberg, 
I'll look through older commits to see if there's any mention of a uniqueness 
constraint.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to