singhpk234 commented on code in PR #4473:
URL: https://github.com/apache/iceberg/pull/4473#discussion_r855316914


##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java:
##########
@@ -273,14 +275,13 @@ public StreamingOffset initialOffset() {
       table.refresh();
       StreamingOffset offset = determineStartingOffset(table, fromTimestamp);
 
-      OutputFile outputFile = io.newOutputFile(initialOffsetLocation);
-      writeOffset(offset, outputFile);
-
+      writeOffset(offset);
       return offset;
     }
 
-    private void writeOffset(StreamingOffset offset, OutputFile file) {
-      try (OutputStream outputStream = file.create()) {
+    private void writeOffset(StreamingOffset offset) {
+      OutputFile file = io.newOutputFile(initialOffsetLocation);

Review Comment:
   [question] I have a doubt, can multiple streaming writing to same 
check-point location, result in nondeterministic state.
   
   Let's say initial snapshot id was : snapshot(1)
   Table state  (snapshot(1) -> snapshot(2) -> snapshot(3) -> snapshot(4))
   Now stream 1 started in cluster 1 and read / commited till snapshot(2), 
   Now at the same time stream 2 started in cluster 2, it started from 
snapshot(2), before it could commit snapshot(3) , stream 1 committed 
snapshot(3) & snapshot(4) now when stream2 tried to commit snapshot(3) it will 
be overwriting the state of stream 1 or vice versa, starting state of new 
stream let's say stream3 in cluster 3 nondeterministic,  as one stream is 
running ahead of other (stream 1 / stream 2).
   
   Also earlier if two stream would have started in the same time (with no prev 
checkpoint / offset file) one would have failed since we did a file.create() 
now they can co-exist. Your thoughts ?
   
   
    
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to