Matthias Pohl created FLINK-32098:
-------------------------------------
Summary: Dispatcher#submitJob calls
Dispatcher#isInGloballyTerminalState up to three times which might be expensive
due to IO
Key: FLINK-32098
URL: https://issues.apache.org/jira/browse/FLINK-32098
Project: Flink
Issue Type: Bug
Components: Runtime / Coordination
Affects Versions: 1.16.1, 1.17.0, 1.18.0
Reporter: Matthias Pohl
{{Dispatcher#submitJob}} calls {{Dispatcher#isInGloballyTerminalState}} up to
three times (1x through {{Dispatcher#isDuplicateJob}} and 2x directly) which
calls {{JobResultStore#hasJobResultStore}}. {{hasJobResultStore}} calls
{{hasDirtyJobResultEntry}} and {{hasCleanJobResultEntry}} if the underlying job
hasn't completed globally, yet. Both calls run {{FileSystem#exists}} on an
non-existing file which can be a quite expensive operation (depending on the
{{FileSystem}} implementation for object storage) since it might require a full
table scan.
tbh, so far, nobody complained. But we might want to either reconsider the
{{FileSystemJobResultStore}}/{{JobResultStore#hasJobResultEntry}}
implementation or, at least, reduce the number of {{isInGloballyTerminalState}}
in the {{Dispatcher}} and document the performance issue in the JavaDoc.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)