Stephan Ewen created FLINK-24343:
------------------------------------
Summary: Revisit Scheduler and Coordinator Startup Procedure
Key: FLINK-24343
URL: https://issues.apache.org/jira/browse/FLINK-24343
Project: Flink
Issue Type: Bug
Components: Runtime / Coordination
Affects Versions: 1.13.2, 1.14.0
Reporter: Stephan Ewen
Fix For: 1.15.0
We need to re-examine the startup procedure of the scheduler, and how it
interacts with the startup of the operator coordinators.
We need to make sure the following conditions are met:
- The Operator Coordinators are started before the first action happens that
they need to be informed of. That includes as task being ready, a checkpoint
happening, etc.
- The scheduler must be started to the point that it can handle
"failGlobal()" calls, because the coordinators might trigger that during their
startup when an exception in "start()" occurs.
/cc [~chesnay]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)