rdblue commented on code in PR #5568:
URL: https://github.com/apache/iceberg/pull/5568#discussion_r949392983


##########
core/src/main/java/org/apache/iceberg/TableMetadata.java:
##########
@@ -1582,13 +1582,6 @@ private static List<HistoryEntry> updateSnapshotLog(
             && !intermediateSnapshotIds.contains(snapshotId)) {
           // copy the log entries that are still valid
           newSnapshotLog.add(logEntry);
-        } else {
-          // any invalid entry causes the history before it to be removed. 
otherwise, there could be
-          // history gaps that cause time-travel queries to produce incorrect 
results. for example,
-          // if history is [(t1, s1), (t2, s2), (t3, s3)] and s2 is removed, 
the history cannot be
-          // [(t1, s1), (t3, s3)] because it appears that s3 was current 
during the time between t2
-          // and t3 when in fact s2 was the current snapshot.
-          newSnapshotLog.clear();

Review Comment:
   This problem was introduced in #3664. The [original 
code](https://github.com/apache/iceberg/commit/6d3b1f7996d4ceee5d5e28df7f378f6fe377950e#diff-c540a31e66b157a8f080433c82a29a070096d0e08c6578a0099153f1229bdb7aL635-L649)
 that this comment applies to cleared the log when it encountered a snapshot 
that has been expired and removed. That's the correct behavior.
   
   The problem is that I combined that behavior with the check to suppress 
intermediate snapshots in a transaction. We should suppress those snapshots in 
the log, but we don't need to clear the log. Here's what I think should happen 
instead:
   
   ```java
           if (snapshotsById.containsKey(snapshotId)) {
             if (!intermediateSnapshotIds.contains(snapshotId)) {
               // copy the log entries that are still valid
               newSnapshotLog.add(logEntry);
             }
           } else {
             // any invalid entry causes the history before it to be removed. 
otherwise, there could be
             // history gaps that cause time-travel queries to produce 
incorrect results. for example,
             // if history is [(t1, s1), (t2, s2), (t3, s3)] and s2 is removed, 
the history cannot be
             // [(t1, s1), (t3, s3)] because it appears that s3 was current 
during the time between t2
             // and t3 when in fact s2 was the current snapshot.
             newSnapshotLog.clear();
           }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to