Re: [DISCUSS] FLIP-6 - Flink Deployment and Process Model - Standalone, Yarn, Mesos, Kubernetes, etc.

Maximilian Michels Thu, 28 Jul 2016 02:23:35 -0700

Hi Stephan,

Thanks for the nice wrap-up of ideas and discussions we had over the
last months (not all on the mailing list though because we were just
getting started with the FLIP process). The document is very
comprehensive and explains the changes in great details, even up to
the message passing level.

What I really like about the FLIP is that we delegate multi-tenancy
away from the JobManager to the resource management framework and the
dispatchers. This will help to make the JobManager component cleaner
and simpler. The prospect of having the user jars directly in the
system classpath of the workers, instead of dealing with custom class
loaders, is very nice.

The model we have for acquiring and releasing resources wouldn't work
particularly well with all the new deployment options, so +1 on a new
task slot request/offer system and +1 for making the ResourceManager
responsible for TaskManager registration and slot management. This is
well aligned with the initial idea of the ResourceManager component.

We definitely need good testing for these changes since the
possibility of bugs increases with the additional number of messages
introduced.

The only thing that bugs me is whether we make the Standalone mode a
bit less nice to use. The initial bootstrapping of the nodes via the
local dispatchers and the subsequent registration of TaskManagers and
allocation of slots could cause some delay. It's not a major concern
though because it will take little time compared to the actual job run
time (unless you run a tiny WordCount).

Cheers,
Max

On Fri, Jul 22, 2016 at 9:26 PM, Stephan Ewen <[email protected]> wrote:
> Hi all!
>
> Here comes a pretty big FLIP: "Improvements to the Flink Deployment and
> Process Model", to better support Yarn, Mesos, Kubernetes, and whatever
> else Google, Elon Musk, and all the other folks will think up next.
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
>
> It is a pretty big FLIP where I took input and thoughts from many people,
> like Till, Max, Xiaowei (and his colleagues), Eron, and others.
>
> The core ideas revolve around
>   - making the JobManager in its core a per-job component (handle multi
> tenancey outside the JobManager)
>   - making resource acquisition and release more dynamic
>   - tying deployments more naturally to jobs where desirable
>
>
> Let's get the discussion started...
>
> Greetings,
> Stephan

Re: [DISCUSS] FLIP-6 - Flink Deployment and Process Model - Standalone, Yarn, Mesos, Kubernetes, etc.

Reply via email to