To be honest I'm terrified at the idea of splitting the Dispatcher into
several processes, even more so if this is supposed to be opt-in and
specific to session mode.
It would fragment the coordination layer even more than it already is,
and make ops more complicated (yet another set of processes to monitor,
configure etc.).
I'm not convinced that this proposal really gets us a lot of benefits;
and would rather propose that you split your single session cluster into
multiple session clusters (with the scheduling component in front of it
to distribute jobs) to even the load.
> The currently idling JobManagers could be utilized to take over some
of the workload from the leader.
This would also be the path I would go down if we'd try to tackle this.
On 17/08/2022 16:22, Matthias Pohl wrote:
Hi Conrad,
thanks for reaching out to the community with your proposal. I looked
through FLIP-257 [1]. Your motivation sounds interesting. Can you
elaborate a bit more on the concrete use-cases you have in mind? How
do these match the user-cases of the two favored execution modes, i.e.
Flink's Session and Application mode?
As mentioned in [2], Application Mode should be preferred for single
long-running jobs to isolate the resources of each of those jobs from
each other. In contrast, Session Mode is the natural choice when
running rather small/short-lived jobs (e.g. FlinkSQL queries) or when
deploying some kind of dev environment for testing out job
implementations. It feels like your use-case is somewhere in between a
bit? It would be interesting to get a better understanding of where
you're coming from. Maybe, you could provide some workload statistics?
That considered, I guess it's a topic worth looking into. Here are a
few thoughts after looking into FLIP-257:
- As far as I can see, the BlobServer is used for sharing
configuration information (e.g. Classpath information) as part of the
ExecutionGraph instantiation [3]. The JobGraph is not persisted
through the BlobServer but rather stored in the JobGraphStore backed
by the HighAvailabilityServices implementation. The HA side is not
really covered in FLIP-257, yet.
- The approach of having the current Dispatcher living next to the new
JobMasterDispatcher (that encapsulates the logic around distributing
the workload onto multiple runners) leaves me with some doubt whether
there wouldn't be a better separation of concerns here. What about
leaving the Dispatcher as is but adding some abstraction between
JobManagerRunner/JobMaster and the Dispatcher that hides the logic
around whether these instances are "deployed" on the same machine or
somewhere else.
- About distributing JobManager workload: The JobManager already
utilizes leader election for faster recovery. Hence, one can set up
multiple JobManagers in idle mode which wait to gain leadership and
pick up the work (i.e. recovering the jobs) of the previously failed
JobManager leader. What about utilizing this setup: The currently
idling JobManagers could be utilized to take over some of the workload
from the leader. I haven't thought this through, yet. But I'm
wondering whether that would be a path we could go down. This would
enable us to still stick to the JobManager/TaskManager setup which
users are already used to rather than introducing another type of
cluster node.
- The JobManager initialization logic is kind of tricky to get your
head around. There is some overhead, I hope, we could clean up as part
of the efforts of removing the per-Job Mode from Flink [4]. It was
decided to deprecate per-Job Mode in Flink 1.15. But we have to stick
with it for some time (i.e. it's not going to be removed in 1.16)
since it's a quite basic feature users might rely on. This shouldn't
be a blocker. I just wanted to mention it to have it in the back of
our minds when looking into ways to come up with a solid proposal for
FLIP-257.
- My concern is that this FLIP might turn out to be larger than
expected and that it might be worth cutting it down into smaller
chunks with each being covered in a separate FLIP down the road if we
have some agreement and a clearer picture on how this should be tackled.
I'm gonna add Chesnay and David to this discussion.
Best,
Matthias
[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-257+Flink+JobManager+Process+Split
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/#deployment-modes
[3]
https://github.com/apache/flink/blob/9ed70a1e8b5d59abdf9d7673bc5b44d421140ef0/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/DefaultExecutionGraph.java#L333
[4] https://lists.apache.org/thread/b8g76cqgtr2c515rd1bs41vy285f317n
On Tue, Aug 16, 2022 at 11:43 AM Zheng Yu Chen <jam.gz...@gmail.com>
wrote:
Hi community ~
I think this title should be quite interesting. The idea is to
reduce the
workload of the JobManager and make the SessionCluster [2] more
stable in
the process of running jobs. I designed a plan for splitting the
JobManager
on FLIP-257 [1]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-257+Flink+JobManager+Process+Split
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-257+Flink+JobManager+JobMaster+Thread+Split+to+Process>
This proposal proposes a splitting scheme for the current process
and a new
process implementation idea that is compatible with the original
process
model: splitting the internal JobMaster component of the
JobManager, and
controlling whether to enable this new process through a parameter
In the
split scheme, when the user configures, the JobMaster will make it
run as
an independent service, reducing the workload of the JobManager. By
implementing a new Dispatcher to communicate and interact with a
single
split JobMaster or multiple JobMasters, to achieve job management
The main features that it provides is:
- After the user submits the job, the JobMaster thread was
split into
other processes to run in the past. They no longer run in the
JobManager,
but in other processes.
- Users can deploy multiple components mentioned above, which run
multiple JobMaster threads, thereby reducing the workload of
the JobManager
Some of the challenging use cases that these features solve are:
- Compatible with the original job running mode (run JobMaster
Thread on
JobManager)
- Implement a new Dispatcher that forwards client operations
related to
jobs
I would love to hear and address your thoughts and feedback , and if
possible drive a FLIP-257 !
[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-257+Flink+JobManager+Process+Split
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-257+Flink+JobManager+JobMaster+Thread+Split+to+Process>
[2]
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/overview/#session-mode
--
Have a nice day ~
ConradJam