Re: [Discuss] Let's Session Cluster JobManager take a breather (FLIP-257: Flink JobManager Process Split)

Chesnay Schepler Wed, 17 Aug 2022 07:32:01 -0700

To be honest I'm terrified at the idea of splitting the Dispatcher intoseveral processes, even more so if this is supposed to be opt-in andspecific to session mode.It would fragment the coordination layer even more than it already is,and make ops more complicated (yet another set of processes to monitor,configure etc.).

I'm not convinced that this proposal really gets us a lot of benefits;and would rather propose that you split your single session cluster intomultiple session clusters (with the scheduling component in front of itto distribute jobs) to even the load.

> The currently idling JobManagers could be utilized to take over someof the workload from the leader.


This would also be the path I would go down if we'd try to tackle this.

On 17/08/2022 16:22, Matthias Pohl wrote:

Hi Conrad,
thanks for reaching out to the community with your proposal. I lookedthrough FLIP-257 [1]. Your motivation sounds interesting. Can youelaborate a bit more on the concrete use-cases you have in mind? Howdo these match the user-cases of the two favored execution modes, i.e.Flink's Session and Application mode?
As mentioned in [2], Application Mode should be preferred for singlelong-running jobs to isolate the resources of each of those jobs fromeach other. In contrast, Session Mode is the natural choice whenrunning rather small/short-lived jobs (e.g. FlinkSQL queries) or whendeploying some kind of dev environment for testing out jobimplementations. It feels like your use-case is somewhere in between abit? It would be interesting to get a better understanding of whereyou're coming from. Maybe, you could provide some workload statistics?
That considered, I guess it's a topic worth looking into. Here are afew thoughts after looking into FLIP-257:- As far as I can see, the BlobServer is used for sharingconfiguration information (e.g. Classpath information) as part of theExecutionGraph instantiation [3]. The JobGraph is not persistedthrough the BlobServer but rather stored in the JobGraphStore backedby the HighAvailabilityServices implementation. The HA side is notreally covered in FLIP-257, yet.- The approach of having the current Dispatcher living next to the newJobMasterDispatcher (that encapsulates the logic around distributingthe workload onto multiple runners) leaves me with some doubt whetherthere wouldn't be a better separation of concerns here. What aboutleaving the Dispatcher as is but adding some abstraction betweenJobManagerRunner/JobMaster and the Dispatcher that hides the logicaround whether these instances are "deployed" on the same machine orsomewhere else.- About distributing JobManager workload: The JobManager alreadyutilizes leader election for faster recovery. Hence, one can set upmultiple JobManagers in idle mode which wait to gain leadership andpick up the work (i.e. recovering the jobs) of the previously failedJobManager leader. What about utilizing this setup: The currentlyidling JobManagers could be utilized to take over some of the workloadfrom the leader. I haven't thought this through, yet. But I'mwondering whether that would be a path we could go down. This wouldenable us to still stick to the JobManager/TaskManager setup whichusers are already used to rather than introducing another type ofcluster node.- The JobManager initialization logic is kind of tricky to get yourhead around. There is some overhead, I hope, we could clean up as partof the efforts of removing the per-Job Mode from Flink [4]. It wasdecided to deprecate per-Job Mode in Flink 1.15. But we have to stickwith it for some time (i.e. it's not going to be removed in 1.16)since it's a quite basic feature users might rely on. This shouldn'tbe a blocker. I just wanted to mention it to have it in the back ofour minds when looking into ways to come up with a solid proposal forFLIP-257.- My concern is that this FLIP might turn out to be larger thanexpected and that it might be worth cutting it down into smallerchunks with each being covered in a separate FLIP down the road if wehave some agreement and a clearer picture on how this should be tackled.
I'm gonna add Chesnay and David to this discussion.

Best,
Matthias
[1]https://cwiki.apache.org/confluence/display/FLINK/FLIP-257+Flink+JobManager+Process+Split[2]https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/#deployment-modes[3]https://github.com/apache/flink/blob/9ed70a1e8b5d59abdf9d7673bc5b44d421140ef0/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/DefaultExecutionGraph.java#L333
[4] https://lists.apache.org/thread/b8g76cqgtr2c515rd1bs41vy285f317n
On Tue, Aug 16, 2022 at 11:43 AM Zheng Yu Chen <jam.gz...@gmail.com>wrote:
    Hi community ~

    I think this title should be quite interesting. The idea is to
    reduce the
    workload of the JobManager and make the SessionCluster [2] more
    stable in
    the process of running jobs. I designed a plan for splitting the
    JobManager
    on FLIP-257 [1]:
    
https://cwiki.apache.org/confluence/display/FLINK/FLIP-257+Flink+JobManager+Process+Split
    
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-257+Flink+JobManager+JobMaster+Thread+Split+to+Process>

    This proposal proposes a splitting scheme for the current process
    and a new
    process implementation idea that is compatible with the original
    process
    model: splitting the internal JobMaster component of the
    JobManager, and
    controlling whether to enable this new process through a parameter
    In the
    split scheme, when the user configures, the JobMaster will make it
    run as
    an independent service, reducing the workload of the JobManager. By
    implementing a new Dispatcher to communicate and interact with a
    single
    split JobMaster or multiple JobMasters, to achieve job management

    The main features that it provides is:

       - After the user submits the job, the JobMaster thread was
    split into
       other processes to run in the past. They no longer run in the
    JobManager,
       but in other processes.
       - Users can deploy multiple components mentioned above, which run
       multiple JobMaster threads, thereby reducing the workload of
    the JobManager

    Some of the challenging use cases that these features solve are:

       - Compatible with the original job running mode (run JobMaster
    Thread on
       JobManager)
       - Implement a new Dispatcher that forwards client operations
    related to
       jobs


     I would love to hear and address your thoughts and feedback , and if
    possible drive a FLIP-257 ！


    [1]
    
https://cwiki.apache.org/confluence/display/FLINK/FLIP-257+Flink+JobManager+Process+Split
    
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-257+Flink+JobManager+JobMaster+Thread+Split+to+Process>

    [2]
    
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/overview/#session-mode
--
    Have a nice day ~

    ConradJam

Re: [Discuss] Let's Session Cluster JobManager take a breather (FLIP-257: Flink JobManager Process Split)

Reply via email to