Re: [PR] Flink: Prevent recreation of ManifestOutputFileFactory during flushing [iceberg]

via GitHub Sat, 18 Oct 2025 03:37:12 -0700


pvary commented on code in PR #14358:
URL: https://github.com/apache/iceberg/pull/14358#discussion_r2441590172



##########
flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicWriteResultAggregator.java:
##########
@@ -61,23 +58,24 @@ class DynamicWriteResultAggregator
 
   private static final Logger LOG = 
LoggerFactory.getLogger(DynamicWriteResultAggregator.class);
   private static final byte[] EMPTY_MANIFEST_DATA = new byte[0];
-  private static final Duration CACHE_EXPIRATION_DURATION = 
Duration.ofMinutes(1);
 
   private final CatalogLoader catalogLoader;
+  private final int cacheMaximumSize;
 
   private long lastCheckpointId = CheckpointIDCounter.INITIAL_CHECKPOINT_ID - 
1;
 
   private transient Map<WriteTarget, Collection<DynamicWriteResult>> results;
-  private transient Cache<String, Map<Integer, PartitionSpec>> specs;
-  private transient Cache<String, ManifestOutputFileFactory> 
outputFileFactories;
+  private transient Map<String, Map<Integer, PartitionSpec>> specs;

Review Comment:
   The Dynamic Sink is built on the premise, that it is a long running process 
(potentially forever), and it could handle as many tables as needed. This would 
mean potentially infinite numbers, and something like a memory leak, if the 
target tables are keep changing, and we don't clean old data. Also this map is 
stored for every task, which could cause serious memory waste when the 
parallelism is high.
   LRUCache is a low cost solution which we reuse from other operators where 
the size of the cached objects is bigger.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Flink: Prevent recreation of ManifestOutputFileFactory during flushing [iceberg]

Reply via email to