Hi Stephan, Thanks for the nice wrap-up of ideas and discussions we had over the last months (not all on the mailing list though because we were just getting started with the FLIP process). The document is very comprehensive and explains the changes in great details, even up to the message passing level.
What I really like about the FLIP is that we delegate multi-tenancy away from the JobManager to the resource management framework and the dispatchers. This will help to make the JobManager component cleaner and simpler. The prospect of having the user jars directly in the system classpath of the workers, instead of dealing with custom class loaders, is very nice. The model we have for acquiring and releasing resources wouldn't work particularly well with all the new deployment options, so +1 on a new task slot request/offer system and +1 for making the ResourceManager responsible for TaskManager registration and slot management. This is well aligned with the initial idea of the ResourceManager component. We definitely need good testing for these changes since the possibility of bugs increases with the additional number of messages introduced. The only thing that bugs me is whether we make the Standalone mode a bit less nice to use. The initial bootstrapping of the nodes via the local dispatchers and the subsequent registration of TaskManagers and allocation of slots could cause some delay. It's not a major concern though because it will take little time compared to the actual job run time (unless you run a tiny WordCount). Cheers, Max On Fri, Jul 22, 2016 at 9:26 PM, Stephan Ewen <se...@apache.org> wrote: > Hi all! > > Here comes a pretty big FLIP: "Improvements to the Flink Deployment and > Process Model", to better support Yarn, Mesos, Kubernetes, and whatever > else Google, Elon Musk, and all the other folks will think up next. > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077 > > It is a pretty big FLIP where I took input and thoughts from many people, > like Till, Max, Xiaowei (and his colleagues), Eron, and others. > > The core ideas revolve around > - making the JobManager in its core a per-job component (handle multi > tenancey outside the JobManager) > - making resource acquisition and release more dynamic > - tying deployments more naturally to jobs where desirable > > > Let's get the discussion started... > > Greetings, > Stephan