pvary commented on code in PR #14358: URL: https://github.com/apache/iceberg/pull/14358#discussion_r2441590172
########## flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicWriteResultAggregator.java: ########## @@ -61,23 +58,24 @@ class DynamicWriteResultAggregator private static final Logger LOG = LoggerFactory.getLogger(DynamicWriteResultAggregator.class); private static final byte[] EMPTY_MANIFEST_DATA = new byte[0]; - private static final Duration CACHE_EXPIRATION_DURATION = Duration.ofMinutes(1); private final CatalogLoader catalogLoader; + private final int cacheMaximumSize; private long lastCheckpointId = CheckpointIDCounter.INITIAL_CHECKPOINT_ID - 1; private transient Map<WriteTarget, Collection<DynamicWriteResult>> results; - private transient Cache<String, Map<Integer, PartitionSpec>> specs; - private transient Cache<String, ManifestOutputFileFactory> outputFileFactories; + private transient Map<String, Map<Integer, PartitionSpec>> specs; Review Comment: The Dynamic Sink is built on the premise, that it is a long running process (potentially forever), and it could handle as many tables as needed. This would mean potentially infinite numbers, and something like a memory leak, if the target tables are keep changing, and we don't clean old data. Also this map is stored for every task, which could cause serious memory waste when the parallelism is high. LRUCache is a low cost solution which we reuse from other operators where the size of the cached objects is bigger. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
