singhpk234 commented on code in PR #4473:
URL: https://github.com/apache/iceberg/pull/4473#discussion_r855316914
##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java:
##########
@@ -273,14 +275,13 @@ public StreamingOffset initialOffset() {
table.refresh();
StreamingOffset offset = determineStartingOffset(table, fromTimestamp);
- OutputFile outputFile = io.newOutputFile(initialOffsetLocation);
- writeOffset(offset, outputFile);
-
+ writeOffset(offset);
return offset;
}
- private void writeOffset(StreamingOffset offset, OutputFile file) {
- try (OutputStream outputStream = file.create()) {
+ private void writeOffset(StreamingOffset offset) {
+ OutputFile file = io.newOutputFile(initialOffsetLocation);
Review Comment:
[question] I have a doubt, can multiple streaming writing to same
check-point location, result in nondeterministic state.
Let's say initial snapshot id was : snapshot(1)
Table state (snapshot(1) -> snapshot(2) -> snapshot(3) -> snapshot(4))
Now stream 1 started in cluster 1 and read / commited till snapshot(2),
Now at the same time stream 2 started in cluster 2, it started from
snapshot(2), before it could commit snapshot(3) , stream 1 committed
snapshot(3) & snapshot(4) now when stream2 tried to commit snapshot(3) it will
be overwriting the state of stream 1 or vice versa, starting state of new
stream let's say stream3 in cluster 3 nondeterministic, as one stream is
running ahead of other (stream 1 / stream 2).
Also earlier if two stream would have started in the same time (with no
checkpoint file) one would have failed since we did a file.create() now they
can co-exist. Your thoughts ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]