Sorry for joining the discussion late. I'm leaning towards deprecating the per-job mode soonish, and eventually dropping it in the long-term. - One less deployment mode makes it easier for users (especially newcomers) to understand. Deprecating the per-job mode sends the signal that it is legacy, not recommended, and in most cases users do not need to care about it. - For most (if not all) user demands that are satisfied by the per-job mode but not by the application mode, AFAICS, they can be either workaround or eventually addressed by the application mode. E.g., make application mode support shipping local dependencies. - I'm not sure about dropping the per-job mode soonish, as many users are still working with it. We'd better not force these users to migrate to the application mode when upgrading the Flink version.
Thank you~ Xintong Song On Fri, Jan 21, 2022 at 4:30 PM Konstantin Knauf <kna...@apache.org> wrote: > Thanks Thomas & Biao for your feedback. > > Any additional opinions on how we should proceed with per job-mode? As you > might have guessed, I am leaning towards proposing to deprecate per-job > mode. > > On Thu, Jan 13, 2022 at 5:11 PM Thomas Weise <t...@apache.org> wrote: > >> Regarding session mode: >> >> ## Session Mode >> * main() method executed in client >> >> Session mode also supports execution of the main method on Jobmanager >> with submission through REST API. That's how Flinkk k8s operators like >> [1] work. It's actually an important capability because it allows for >> allocation of the cluster resources prior to taking down the previous >> job during upgrade when the goal is optimization for availability. >> >> Thanks, >> Thomas >> >> [1] https://github.com/lyft/flinkk8soperator >> >> On Thu, Jan 13, 2022 at 12:32 AM Konstantin Knauf <kna...@apache.org> >> wrote: >> > >> > Hi everyone, >> > >> > I would like to discuss and understand if the benefits of having Per-Job >> > Mode in Apache Flink outweigh its drawbacks. >> > >> > >> > *# Background: Flink's Deployment Modes* >> > Flink currently has three deployment modes. They differ in the following >> > dimensions: >> > * main() method executed on Jobmanager or Client >> > * dependencies shipped by client or bundled with all nodes >> > * number of jobs per cluster & relationship between job and cluster >> > lifecycle* (supported resource providers) >> > >> > ## Application Mode >> > * main() method executed on Jobmanager >> > * dependencies already need to be available on all nodes >> > * dedicated cluster for all jobs executed from the same main()-method >> > (Note: applications with more than one job, currently still significant >> > limitations like missing high-availability). Technically, a session >> cluster >> > dedicated to all jobs submitted from the same main() method. >> > * supported by standalone, native kubernetes, YARN >> > >> > ## Session Mode >> > * main() method executed in client >> > * dependencies are distributed from and by the client to all nodes >> > * cluster is shared by multiple jobs submitted from different clients, >> > independent lifecycle >> > * supported by standalone, Native Kubernetes, YARN >> > >> > ## Per-Job Mode >> > * main() method executed in client >> > * dependencies are distributed from and by the client to all nodes >> > * dedicated cluster for a single job >> > * supported by YARN only >> > >> > >> > *# Reasons to Keep** There are use cases where you might need the >> > combination of a single job per cluster, but main() method execution in >> the >> > client. This combination is only supported by per-job mode. >> > * It currently exists. Existing users will need to migrate to either >> > session or application mode. >> > >> > >> > *# Reasons to Drop** With Per-Job Mode and Application Mode we have two >> > modes that for most users probably do the same thing. Specifically, for >> > those users that don't care where the main() method is executed and >> want to >> > submit a single job per cluster. Having two ways to do the same thing is >> > confusing. >> > * Per-Job Mode is only supported by YARN anyway. If we keep it, we >> should >> > work towards support in Kubernetes and Standalone, too, to reduce >> special >> > casing. >> > * Dropping per-job mode would reduce complexity in the code and allow >> us to >> > dedicate more resources to the other two deployment modes. >> > * I believe with session mode and application mode we have to easily >> > distinguishable and understandable deployment modes that cover Flink's >> use >> > cases: >> > * session mode: olap-style, interactive jobs/queries, short lived >> batch >> > jobs, very small jobs, traditional cluster-centric deployment mode (fits >> > the "Hadoop world") >> > * application mode: long-running streaming jobs, large scale & >> > heterogenous jobs (resource isolation!), application-centric deployment >> > mode (fits the "Kubernetes world") >> > >> > >> > *# Call to Action* >> > * Do you use per-job mode? If so, why & would you be able to migrate to >> one >> > of the other methods? >> > * Am I missing any pros/cons? >> > * Are you in favor of dropping per-job mode midterm? >> > >> > Cheers and thank you, >> > >> > Konstantin >> > >> > -- >> > >> > Konstantin Knauf >> > >> > https://twitter.com/snntrable >> > >> > https://github.com/knaufk >> > > > -- > > Konstantin Knauf > > https://twitter.com/snntrable > > https://github.com/knaufk >