[GitHub] [iceberg] zhengzengprc commented on a change in pull request #3039: Introduce spark3 option to read stream from a timestamp

GitBox Fri, 24 Sep 2021 19:18:04 -0700


zhengzengprc commented on a change in pull request #3039:
URL: https://github.com/apache/iceberg/pull/3039#discussion_r715975873




##########
File path: 
spark3/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java
##########
@@ -238,9 +247,8 @@ public StreamingOffset initialOffset() {
       }
 
       table.refresh();
-      StreamingOffset offset = table.currentSnapshot() == null ?
-          StreamingOffset.START_OFFSET :
-          new StreamingOffset(SnapshotUtil.oldestSnapshot(table).snapshotId(), 
0, false);
+      StreamingOffset offset = isStreamEmpty(table, fromTimestamp) ? 
StreamingOffset.START_OFFSET :
+          new StreamingOffset(SnapshotUtil.oldestSnapshot(table, 
fromTimestamp).snapshotId(), 0, false);

Review comment:
       Sorry my understanding is wrong. @kbendick Yes, that's exactly the 
behavior.
   Just create a unit test to prove the above scenario. When we pass 30, the 
stream processing will return nothing
   Because:
   @Override
     public InputPartition[] planInputPartitions(Offset start, Offset end) {
       Preconditions.checkArgument(end instanceof StreamingOffset, "Invalid end 
offset: %s is not a StreamingOffset", end);
       Preconditions.checkArgument(
           start instanceof StreamingOffset, "Invalid start offset: %s is not a 
StreamingOffset", start);
   
       if (end.equals(StreamingOffset.START_OFFSET)) {
         return new InputPartition[0];
       }
   
   In this case start == end, we won't have anything offset to process




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] zhengzengprc commented on a change in pull request #3039: Introduce spark3 option to read stream from a timestamp

Reply via email to