pnowojski commented on code in PR #21503:
URL: https://github.com/apache/flink/pull/21503#discussion_r1102695388
##########
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/SubtaskCheckpointCoordinatorImpl.java:
##########
@@ -177,6 +182,14 @@ class SubtaskCheckpointCoordinatorImpl implements
SubtaskCheckpointCoordinator {
this.checkpoints = new HashMap<>();
this.lock = new Object();
this.asyncOperationsThreadPool =
checkNotNull(asyncOperationsThreadPool);
+ this.asyncDisposeThreadPool =
+ new ThreadPoolExecutor(
+ 0,
+ 4,
+ 60L,
+ TimeUnit.SECONDS,
+ new LinkedBlockingQueue<>(),
+ new ExecutorThreadFactory("AsyncDispose"));
Review Comment:
In that case I would just limit the size of the `asyncOperationsThreadPool`
to something like `maxConcurrentCheckpoints + 1`. It would be sensible to back
pressure newer checkpoints, if system is not keeping up with deleting old
checkpoints. With `maxConcurrentCheckpoints + 1` we will more or less adhere to
the `maxConcurrentCheckpoints` configuration, but allow for a small leeway with
allowing for simultaneous N ongoing concurrent checkpoints and clean up of one
aborted one.
Actually your current design creates a risk of resource leak, if new files
are created faster then old one being deleted (in case of continuously failing
checkpoints).
##########
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/SubtaskCheckpointCoordinatorImpl.java:
##########
@@ -177,6 +182,14 @@ class SubtaskCheckpointCoordinatorImpl implements
SubtaskCheckpointCoordinator {
this.checkpoints = new HashMap<>();
this.lock = new Object();
this.asyncOperationsThreadPool =
checkNotNull(asyncOperationsThreadPool);
+ this.asyncDisposeThreadPool =
+ new ThreadPoolExecutor(
+ 0,
+ 4,
+ 60L,
+ TimeUnit.SECONDS,
+ new LinkedBlockingQueue<>(),
+ new ExecutorThreadFactory("AsyncDispose"));
Review Comment:
Fair point. In that case I would just limit the size of the
`asyncOperationsThreadPool` to something like `maxConcurrentCheckpoints + 1`.
It would be sensible to back pressure newer checkpoints, if system is not
keeping up with deleting old checkpoints. With `maxConcurrentCheckpoints + 1`
we will more or less adhere to the `maxConcurrentCheckpoints` configuration,
but allow for a small leeway with allowing for simultaneous N ongoing
concurrent checkpoints and clean up of one aborted one.
Actually your current design creates a risk of resource leak, if new files
are created faster then old one being deleted (in case of continuously failing
checkpoints).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]