[GitHub] [iceberg] rdblue commented on a change in pull request #3039: Introduce spark3 option to read stream from a timestamp

GitBox Sun, 19 Sep 2021 16:16:27 -0700


rdblue commented on a change in pull request #3039:
URL: https://github.com/apache/iceberg/pull/3039#discussion_r711818802




##########
File path: 
spark3/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java
##########
@@ -238,9 +247,8 @@ public StreamingOffset initialOffset() {
       }
 
       table.refresh();
-      StreamingOffset offset = table.currentSnapshot() == null ?
-          StreamingOffset.START_OFFSET :
-          new StreamingOffset(SnapshotUtil.oldestSnapshot(table).snapshotId(), 
0, false);
+      StreamingOffset offset = isStreamEmpty(table, fromTimestamp) ? 
StreamingOffset.START_OFFSET :
+          new StreamingOffset(SnapshotUtil.oldestSnapshot(table, 
fromTimestamp).snapshotId(), 0, false);

Review comment:
       I don't think this is correct. If there is no current snapshot, then 
`START_OFFSET` is correct. But if there is a current snapshot and the starting 
timestamp is after that snapshot, then the offset should be set as though 
current was completely processed. `START_OFFSET` is not appropriate in that 
case because that requires resuming from a snapshot with no parent.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #3039: Introduce spark3 option to read stream from a timestamp

Reply via email to