twuebi commented on PR #638:
URL: https://github.com/apache/iceberg-go/pull/638#issuecomment-3597385204

   Yes, @zeroshade, I must have misunderstood iceberg-java's updateSnapshot 
logic, and it seems that iceberg-rust may have a bug there, this is java 
source, there's no clearing snapshot log on removeRef, only upon 
`MetadataUpdate.RemoveSnapshot`, dangling references are purged from the 
snapshotlog
   
   ```java
   public Builder removeRef(String name) {
         if (SnapshotRef.MAIN_BRANCH.equals(name)) {
           this.currentSnapshotId = -1;
         }
   
         SnapshotRef ref = refs.remove(name);
         if (ref != null) {
           changes.add(new MetadataUpdate.RemoveSnapshotRef(name));
         }
   
         return this;
       }
   ```
   
   ```java
   private static List<HistoryEntry> updateSnapshotLog(
           List<HistoryEntry> snapshotLog,
           Map<Long, Snapshot> snapshotsById,
           long currentSnapshotId,
           List<MetadataUpdate> changes) {
         Set<Long> intermediateSnapshotIds = intermediateSnapshotIdSet(changes, 
currentSnapshotId);
         boolean hasIntermediateSnapshots = !intermediateSnapshotIds.isEmpty();
         boolean hasRemovedSnapshots =
             changes.stream()
                 .anyMatch(
                     change ->
                         change instanceof MetadataUpdate.RemoveSnapshots
                             || change instanceof 
MetadataUpdate.RemoveSnapshot);
   
         if (!hasIntermediateSnapshots && !hasRemovedSnapshots) {
           return snapshotLog;
         }
   
         // update the snapshot log
         List<HistoryEntry> newSnapshotLog = Lists.newArrayList();
         for (HistoryEntry logEntry : snapshotLog) {
           long snapshotId = logEntry.snapshotId();
           if (snapshotsById.containsKey(snapshotId)) {
             if (!intermediateSnapshotIds.contains(snapshotId)) {
               // copy the log entries that are still valid
               newSnapshotLog.add(logEntry);
             }
           } else if (hasRemovedSnapshots) {
             // any invalid entry causes the history before it to be removed. 
otherwise, there could be
             // history gaps that cause time-travel queries to produce 
incorrect results. for example,
             // if history is [(t1, s1), (t2, s2), (t3, s3)] and s2 is removed, 
the history cannot be
             // [(t1, s1), (t3, s3)] because it appears that s3 was current 
during the time between t2
             // and t3 when in fact s2 was the current snapshot.
             newSnapshotLog.clear();
           }
         }
   
         if (snapshotsById.get(currentSnapshotId) != null) {
           ValidationException.check(
               Iterables.getLast(newSnapshotLog).snapshotId() == 
currentSnapshotId,
               "Cannot set invalid snapshot log: latest entry is not the 
current snapshot");
         }
   
         return newSnapshotLog;
       }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to