rdblue commented on a change in pull request #3775:
URL: https://github.com/apache/iceberg/pull/3775#discussion_r772636577



##########
File path: 
spark/v3.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java
##########
@@ -169,8 +169,7 @@ public void stop() {
   private List<FileScanTask> planFiles(StreamingOffset startOffset, 
StreamingOffset endOffset) {
     List<FileScanTask> fileScanTasks = Lists.newArrayList();
     StreamingOffset batchStartOffset = 
StreamingOffset.START_OFFSET.equals(startOffset) ?
-        new StreamingOffset(SnapshotUtil.firstSnapshotAfterTimestamp(table, 
fromTimestamp).snapshotId(), 0, false) :
-        startOffset;
+        determineStartingOffset(table, fromTimestamp) : startOffset;

Review comment:
       I think that the problem is that we have 3 "start" offsets here:
   1. `StreamingOffset.START_OFFSET` is a fake offset for when we don't have 
enough information, like when the table has no snapshots or the timestamp is in 
the future
   2. The "starting" offset referred to by `determineStartingOffset` is the 
concrete replacement for `START_OFFSET`, if the there is enough information to 
start; otherwise it returns `START_OFFSET` as a placeholder again
   3. `startOffset` is the known starting point for this batch, from the last 
batch's end offset or initialization
   
   I don't think that it makes sense to mix `startOffset` into the method 
because they're different things and `determineStartingOffset` is already 
fairly complicated. We could have a separate method, 
`determineBatchStartOffset` instead, but that would basically be this 
expression.
   
   A better fix for this confusion is probably to rename the method. Maybe 
`determineInitialOffset` is more clear?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to