[
https://issues.apache.org/jira/browse/FLINK-23654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396663#comment-17396663
]
Till Rohrmann commented on FLINK-23654:
---------------------------------------
Thanks for reporting this issue and the analysis [~raganico] and [~Thesharing].
I think you are right that it is not good that we use the same thread pool for
short lived future callback executions and heavy I/O operations. I think that
splitting the thread pool and introducing advanced configuration options to
make the number of threads configurable makes a lot of sense. Does any of you
have time to work on this issue?
> Allow configurable number of jobmanager-future threads
> ------------------------------------------------------
>
> Key: FLINK-23654
> URL: https://issues.apache.org/jira/browse/FLINK-23654
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / REST
> Affects Versions: 1.14.0, 1.12.5, 1.13.2
> Reporter: Nicolas Raga
> Priority: Critical
> Fix For: 1.14.0
>
>
> The JobManagerSharedServices futureExecutor is used for asynchronous request
> in multiple Flink components. When the JobMaster creates the execution graph,
> it passes the *scheduledExecutorService* (which is the
> jobManagerSharedServices.getScheduledExecutorService) to both the
> *futureExecutor* and the *ioExecutor.* In the ExecutionGraph, the
> *ioExecutor* is the executor which is used to execute blocking I/O
> operations. It is also passed in to the *CheckpointCoordinator* which uses it
> for asynchronous calls like disposing pending checkpoints, clean up failed
> checkpoints, etc. The *futureExecutor* is even passed on to the *Execution*
> class, which is then used to dispatch callbacks from futures and asynchronous
> RPC calls from within vertexes! Lastly this executor is also used to process
> asynchronous requests from the Flink REST endpoint.
>
> Hence, using the endpoint for monitoring during large checkpoints or blocking
> I/O operations on the same threadpool causes degraded performance on the
> endpoint. We have already been able to test that an increase in this thread
> count allows to faster responses to incoming requests. We can begin by simply
> exposing a *jobmanager.future-thread.factor* that can provide a factor above
> the number of CPU's. Afterwards, we can consider a dedicated thread pool for
> blocking I/O that won't cause degradation of performance for the REST
> endpoint.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)