aiborodin commented on code in PR #14509:
URL: https://github.com/apache/iceberg/pull/14509#discussion_r2496744800


##########
core/src/main/java/org/apache/iceberg/SnapshotProducer.java:
##########
@@ -459,6 +467,23 @@ public void commit() {
                     return;
                   }
 
+                  // Pass the snapshot ancestry of the base and the updated 
snapshots to the
+                  // validator.
+                  String validationBranch =

Review Comment:
   > I disagree with this since the original validate call is called before we 
build the new update. As a generalized API for validating the snapshots 
pre-commit, we really need to expose the ability to inspect what is planned to 
be committed.
   
   What's the value for clients to inspect the new internal metadata, which is 
about to be committed? The new parent `Snapshot` is sufficient minimal 
information for both Kafka Connect and Flink to decide if the validation 
passes, without relying on the new `TableMetadata`. Everything else can be 
enclosed in validation closures, as @rdblue suggested.
   
   There's a reason why the current `validate(base, parentSnapshot);` call runs 
before the new metadata is generated - first, we don't need these internal 
details to decide as I said above, but also, - because we don't want to waste 
CPU cycles, and IO writing the new manifests and the manifest list just to run 
the validation and immediately throw away these files if it fails.
   
   > The KC case only looks at the prior state, but I don't think we should 
focus so narrowly on that scenario.
   
   I think we can always enhance and extend this API in the future by adding 
more information if needed. For now, it seems reasonable to focus on adding the 
minimal required API, which aligns with the existing validation mechanism in 
`SnapshotProducer::validate(TableMetadata currentMetadata, Snapshot snapshot)`, 
and works for both Kafka Connect and Flink use cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to