rdblue commented on code in PR #4071:
URL: https://github.com/apache/iceberg/pull/4071#discussion_r841898736
##########
core/src/main/java/org/apache/iceberg/TableMetadata.java:
##########
@@ -1007,6 +1007,49 @@ public Builder setBranchSnapshot(long snapshotId, String
branch) {
return this;
}
+ public Builder setTag(String tag, SnapshotRef ref) {
+ SnapshotRef existingRef = refs.get(tag);
+ if (existingRef != null && existingRef.equals(ref)) {
+ return this;
+ }
+ long snapshotId = ref.snapshotId();
+ Snapshot snapshot = snapshotsById.get(snapshotId);
+ ValidationException.check(snapshot != null, "Cannot set %s to unknown
snapshot: %s", tag, snapshotId);
+ ValidationException.check(ref.isTag(), "Cannot create tag: %s is a
branch", tag);
+ refs.put(tag, ref);
+ changes.add(new MetadataUpdate.SetSnapshotRef(tag, snapshotId,
SnapshotRefType.TAG, null, null, null));
+ return this;
+ }
+
+ public Builder setBranch(String branch, SnapshotRef ref) {
+ SnapshotRef existingRef = refs.get(branch);
+ if (existingRef != null && existingRef.equals(ref)) {
+ return this;
+ }
+ long snapshotId = ref.snapshotId();
+ Snapshot snapshot = snapshotsById.get(snapshotId);
+ ValidationException.check(snapshot != null, "Cannot set %s to unknown
snapshot: %s", branch, snapshotId);
+ ValidationException.check(ref.isBranch(), "Cannot create branch: %s is a
tag", branch);
+ refs.put(branch, ref);
+ MetadataUpdate.SetSnapshotRef refUpdate =
+ new MetadataUpdate.SetSnapshotRef(branch, snapshotId,
SnapshotRefType.BRANCH);
+ boolean newRef = existingRef == null;
+ boolean updateMinSnapshots = newRef ||
!Objects.equal(ref.minSnapshotsToKeep(), existingRef.minSnapshotsToKeep());
Review Comment:
> I thought the intent for the MetadataUpdate was to encapsulate as minimal
changes as possible?
If things aren't changing, then we don't include them. So if there is a
no-op change to a snapshot, we just ignore it. But this is a bit different
because we are sending updated state.
If the user calling our `ManageSnapshots` API set `minSnapshotsToKeep` to 1
and we see that it did not change, we can either send it to ensure the user's
intent is carried through or we can suppress it because it is, in theory, a
noop. Neither one of these is perfect. If we send the whole ref state and the
user didn't set the field, it could revert a concurrent change. If we do send
the whole ref state and the user did set the field, the change would not be
known on the server side and a concurrent change could be left in place.
We have 3 options:
1. Always update the ref state to what we have locally
2. Always update only what changed, even though explicitly setting something
to its current value could allow concurrent changes
3. Send requirements to validate no concurrent changes have occurred
I would opt for choice 1, where anything explicitly set locally is sent and
the change is idempotent. The only way to completely avoid the problem is
option 3, but adding requirements for this just over-complicates the REST API
for very little gain. Concurrent changes to ref retention settings are going to
be very, very rare. And we already have a requirement for the ref snapshot-id.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]