danielcweeks commented on code in PR #14509:
URL: https://github.com/apache/iceberg/pull/14509#discussion_r2499796286
##########
core/src/main/java/org/apache/iceberg/SnapshotProducer.java:
##########
@@ -459,6 +467,23 @@ public void commit() {
return;
}
+ // Pass the snapshot ancestry of the base and the updated
snapshots to the
+ // validator.
+ String validationBranch =
Review Comment:
The way I think about is if we're doing snapshot validation, we need to have
insight into two things: 1) what was my snapshot built from, 2) what is
actually being produced.
For #1, this is the main issue for KC specifically, but is also a more
general problem. For example, we can't provide serializable isolation in core
currently because we "secretly" refresh state the client operated on. For
example, if I build an append commit based off the table state as I see it, but
the table refreshing underneath me before committing, I cannot guarantee that
another append wouldn't have affected the change I made, but I don't ever see
that and it will be committed.
For #2, (which is the point were debating here), if we're validating
snapshots, I would want to have the final result of all of the operations the
library performed otherwise there's still a gap between what I validate and
what is actually produced. That's not ideal. This is why I find the
`TableMetadata` validation approach more useful because that represents exactly
what will be committed (exclusive of serialization), which provides the most
fidelity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]