mxm commented on code in PR #14358:
URL: https://github.com/apache/iceberg/pull/14358#discussion_r2439289983
##########
flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicWriteResultAggregator.java:
##########
@@ -118,6 +114,14 @@ public void prepareSnapshotPreBarrier(long checkpointId)
throws IOException {
this.lastCheckpointId = checkpointId;
+ // Pause eviction on the cache for ManifestOutputFileFactory.
+ // This will materialize a list of all output file factories required to
write delta manifests.
+ // Some of the output file factories may already be cached, and we will
reuse those. We must
+ // absolutely avoid re-creating any output file factories _during_ writing
manifests, otherwise
+ // cache eviction may reset the manifest file names for multiple
WriteResults for a given table
+ // which will overwrite already written manifest files!
+ outputFileFactories.haltEviction();
Review Comment:
I had something like this in the initial version, but I was a bit hesitant
to change the file name format due to potential other issues like hitting max
file name limits. Also I tried adding a long-lived counter but I didn't like
that we would have to maintain that externally or via a static counter in the
factory class.
I agree with your concerns though. A UUID probably is the simplest and
cleanest approach.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]