rdblue commented on code in PR #5568:
URL: https://github.com/apache/iceberg/pull/5568#discussion_r949392983
##########
core/src/main/java/org/apache/iceberg/TableMetadata.java:
##########
@@ -1582,13 +1582,6 @@ private static List<HistoryEntry> updateSnapshotLog(
&& !intermediateSnapshotIds.contains(snapshotId)) {
// copy the log entries that are still valid
newSnapshotLog.add(logEntry);
- } else {
- // any invalid entry causes the history before it to be removed.
otherwise, there could be
- // history gaps that cause time-travel queries to produce incorrect
results. for example,
- // if history is [(t1, s1), (t2, s2), (t3, s3)] and s2 is removed,
the history cannot be
- // [(t1, s1), (t3, s3)] because it appears that s3 was current
during the time between t2
- // and t3 when in fact s2 was the current snapshot.
- newSnapshotLog.clear();
Review Comment:
This problem was introduced in #3664. The [original
code](https://github.com/apache/iceberg/commit/6d3b1f7996d4ceee5d5e28df7f378f6fe377950e#diff-c540a31e66b157a8f080433c82a29a070096d0e08c6578a0099153f1229bdb7aL635-L649)
that this comment applies to cleared the log when it encountered a snapshot
that has been expired and removed. That's the correct behavior.
The problem is that I combined that behavior with the check to suppress
intermediate snapshots in a transaction. We should suppress those snapshots in
the log, but we don't need to clear the log. Here's what I think should happen
instead:
```java
if (snapshotsById.containsKey(snapshotId)) {
if (!intermediateSnapshotIds.contains(snapshotId)) {
// copy the log entries that are still valid
newSnapshotLog.add(logEntry);
}
} else {
// any invalid entry causes the history before it to be removed.
otherwise, there could be
// history gaps that cause time-travel queries to produce
incorrect results. for example,
// if history is [(t1, s1), (t2, s2), (t3, s3)] and s2 is removed,
the history cannot be
// [(t1, s1), (t3, s3)] because it appears that s3 was current
during the time between t2
// and t3 when in fact s2 was the current snapshot.
newSnapshotLog.clear();
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]