Re: [DISCUSS] FLIP-6 - Flink Deployment and Process Model - Standalone, Yarn, Mesos, Kubernetes, etc.

Wright, Eron Fri, 05 Aug 2016 09:20:17 -0700

Let me rephrase my comment on the dispatcher.    I mean that its API would be 
job-centric, i.e. with operations like `execute(jobspec)` rather than 
operations like `createSession` that the status-quo would suggest.


Since writing those comments I’ve put more time into developing the Mesos 
dispatcher with FLIP-6 in mind.    I see that Till is spinning up an effort 
too, so we should all sync up in the near future.

Eron



> On Aug 5, 2016, at 7:30 AM, Stephan Ewen <se...@apache.org> wrote:
> 
> Hi Eron!
> 
> Some comments on your comments:
> 
> *Dispatcher*
>  - The dispatcher should NOT be job-centric. The dispatcher should take
> over the "multi job" responsibilities here, now that the JobManager is
> single-job only.
>  - An abstract dispatcher would be great. It could implement the
> connection/HTTP elements and have an abstract method to start a job
>    -> Yarn - use YarnClusterClient to start a YarnJob
>    -> Mesos - same thing
>    -> Standalone - spawn a JobManager
> 
> *Client*
> This is an interesting point. Max is currently refactoring the clients into
>   - Cluster Client (with specialization for Yarn, Standalone) to launch
> jobs and control a cluster (yarn session, ...)
>   - Job Client, which is connected to a single job and can issue commands
> to that job (cancel/stop/checkpoint/savepoint/change-parallelism)
> 
> Let's try and get his input on this.
> 
> 
> *RM*
> Agreed - the base RM is "stateless", specialized RMs can behave different,
> if they need to.
> RM fencing must be generic - all cluster types can suffer from orphaned
> tasks (Yarn as well, I think)
> 
> 
> *User Code*
> I think in the cases where processes/containers are launched per-job, this
> should always be feasible. It is a nice optimization that I think we should
> do where ever possible. Makes users' life with respect to classloading much
> easier.
> Some cases with custom class loading are currently tough in Flink - that
> way, these jobs would at least run in the yarn/mesos individual job mode
> (not the session mode still, that one needs dynamic class loading).
> 
> *Standalone Security*
> That is a known limitation and fine for now, I think. Whoever wants proper
> security needs to go to Yarn/Mesos initially. Standalone v2.0 may change
> that.
> 
> Greetings,
> Stephan
> 
> 
> 
> On Sat, Jul 30, 2016 at 12:26 AM, Wright, Eron <ewri...@live.com> wrote:
> 
>> The design looks great - it solves for very diverse deployment modes,
>> allows for heterogeneous TMs, and promotes job isolation.
>> 
>> Some feedback:
>> 
>> *Dispatcher*
>> The dispatcher concept here expands nicely on what was introduced in the
>> Mesos design doc (MESOS-1984).  The most significant difference being the
>> job-centric orientation of the dispatcher API.   FLIP-6 seems to eliminate
>> the concept of a session (or, defines it simply as the lifecycle of a JM);
>> is that correct?    Do you agree I should revise the Mesos dispatcher
>> design to be job-centric?
>> 
>> I'll be taking the first crack at implementing the dispatcher (for Mesos
>> only) in MESOS-1984 (T2).   I’ll keep FLIP-6 in mind as I go.
>> 
>> The dispatcher's backend behavior will vary significantly for Mesos vs
>> standalone vs others.   Assumedly a base class with concrete
>> implementations will be introduced.  To echo the FLIP-6 design as I
>> understand it:
>> 
>> 1) Standalone
>>   a) The dispatcher process starts an RM, dispatcher frontend, and
>> "local" dispatcher backend at startup.
>>   b) Upon job submission, the local dispatcher backend creates an
>> in-process JM actor for the job.
>>   c) The JM allocates slots as normal.   The RM draws from its pool of
>> registered TM, which grows and shrinks due (only) to external events.
>> 
>> 2) Mesos
>>   a) The dispatcher process starts a dispatcher frontend and "Mesos"
>> dispatcher backend at startup.
>>   b) Upon job submission, the Mesos dispatcher backend creates a Mesos
>> task (dubbed an "AppMaster") which contains a JM/RM for the job.
>>   c) The system otherwise functions as described in the Mesos design doc.
>> 
>> *Client*
>> I'm concerned about the two code paths that the client uses to launch a
>> job (with-dispatcher vs without-dispatcher).   Maybe it could be unified by
>> saying that the client always calls the dispatcher, and that the dispatcher
>> is hostable in either the client or in a separate process.  The only
>> variance would be the client-to-dispatcher transport (local vs HTTP).
>> 
>> *RM*
>> On the issue of RM statefulness, we can say that the RM does not persist
>> slot allocation (the ground truth is in the TM), but may persist other
>> information (related to cluster manager interaction).  For example, the
>> Mesos RM persists the assigned framework identifier and per-task planning
>> information (as is highly recommended by the Mesos development guide).
>> 
>> On RM fencing, I was already wondering whether to add it to the Mesos RM,
>> so it is nice to see it being introduced more generally.   My rationale is,
>> the dispatcher cannot guarantee that only a single RM is running, because
>> orphaned tasks are possible in certain Mesos failure situations.
>> Similarly, I’m unsure whether YARN provides a strong guarantee about the
>> AM.
>> 
>> *User Code*
>> Having job code on the system classpath seems possible in only a subset of
>> cases.   The variability may be complex.   How important is this
>> optimization?
>> 
>> *Security Implications*
>> It should be noted that the standalone embodiment doesn't offer isolation
>> between jobs.  The whole system will have a single security context (as it
>> does now).
>> 
>> Meanwhile, the ‘high-trust’ nature of the dispatcher in other scenarios is
>> rightly emphasized.  The fact that user code shouldn't be run in the
>> dispatcher process (except in standalone) must be kept in mind.   The
>> design doc of FLINK-3929 (section C2) has more detail on that.
>> 
>> 
>> -Eron
>> 
>> 
>>> On Jul 28, 2016, at 2:22 AM, Maximilian Michels <m...@apache.org> wrote:
>>> 
>>> Hi Stephan,
>>> 
>>> Thanks for the nice wrap-up of ideas and discussions we had over the
>>> last months (not all on the mailing list though because we were just
>>> getting started with the FLIP process). The document is very
>>> comprehensive and explains the changes in great details, even up to
>>> the message passing level.
>>> 
>>> What I really like about the FLIP is that we delegate multi-tenancy
>>> away from the JobManager to the resource management framework and the
>>> dispatchers. This will help to make the JobManager component cleaner
>>> and simpler. The prospect of having the user jars directly in the
>>> system classpath of the workers, instead of dealing with custom class
>>> loaders, is very nice.
>>> 
>>> The model we have for acquiring and releasing resources wouldn't work
>>> particularly well with all the new deployment options, so +1 on a new
>>> task slot request/offer system and +1 for making the ResourceManager
>>> responsible for TaskManager registration and slot management. This is
>>> well aligned with the initial idea of the ResourceManager component.
>>> 
>>> We definitely need good testing for these changes since the
>>> possibility of bugs increases with the additional number of messages
>>> introduced.
>>> 
>>> The only thing that bugs me is whether we make the Standalone mode a
>>> bit less nice to use. The initial bootstrapping of the nodes via the
>>> local dispatchers and the subsequent registration of TaskManagers and
>>> allocation of slots could cause some delay. It's not a major concern
>>> though because it will take little time compared to the actual job run
>>> time (unless you run a tiny WordCount).
>>> 
>>> Cheers,
>>> Max
>>> 
>>> 
>>> 
>>> 
>>> On Fri, Jul 22, 2016 at 9:26 PM, Stephan Ewen <se...@apache.org> wrote:
>>>> Hi all!
>>>> 
>>>> Here comes a pretty big FLIP: "Improvements to the Flink Deployment and
>>>> Process Model", to better support Yarn, Mesos, Kubernetes, and whatever
>>>> else Google, Elon Musk, and all the other folks will think up next.
>>>> 
>>>> https://cwiki.apache.org/confluence/pages/viewpage.
>> action?pageId=65147077
>>>> 
>>>> It is a pretty big FLIP where I took input and thoughts from many
>> people,
>>>> like Till, Max, Xiaowei (and his colleagues), Eron, and others.
>>>> 
>>>> The core ideas revolve around
>>>> - making the JobManager in its core a per-job component (handle multi
>>>> tenancey outside the JobManager)
>>>> - making resource acquisition and release more dynamic
>>>> - tying deployments more naturally to jobs where desirable
>>>> 
>>>> 
>>>> Let's get the discussion started...
>>>> 
>>>> Greetings,
>>>> Stephan
>> 
>>

Re: [DISCUSS] FLIP-6 - Flink Deployment and Process Model - Standalone, Yarn, Mesos, Kubernetes, etc.

Reply via email to