Re: [DISCUSS] Future of Per-Job Mode

Xintong Song Mon, 24 Jan 2022 01:00:41 -0800

Sorry for joining the discussion late.

I'm leaning towards deprecating the per-job mode soonish, and eventually
dropping it in the long-term.
- One less deployment mode makes it easier for users (especially newcomers)
to understand. Deprecating the per-job mode sends the signal that it is
legacy, not recommended, and in most cases users do not need to care about
it.
- For most (if not all) user demands that are satisfied by the per-job mode
but not by the application mode, AFAICS, they can be either workaround or
eventually addressed by the application mode. E.g., make application mode
support shipping local dependencies.
- I'm not sure about dropping the per-job mode soonish, as many users are
still working with it. We'd better not force these users to migrate to the
application mode when upgrading the Flink version.


Thank you~

Xintong Song



On Fri, Jan 21, 2022 at 4:30 PM Konstantin Knauf <kna...@apache.org> wrote:

> Thanks Thomas & Biao for your feedback.
>
> Any additional opinions on how we should proceed with per job-mode? As you
> might have guessed, I am leaning towards proposing to deprecate per-job
> mode.
>
> On Thu, Jan 13, 2022 at 5:11 PM Thomas Weise <t...@apache.org> wrote:
>
>> Regarding session mode:
>>
>> ## Session Mode
>> * main() method executed in client
>>
>> Session mode also supports execution of the main method on Jobmanager
>> with submission through REST API. That's how Flinkk k8s operators like
>> [1] work. It's actually an important capability because it allows for
>> allocation of the cluster resources prior to taking down the previous
>> job during upgrade when the goal is optimization for availability.
>>
>> Thanks,
>> Thomas
>>
>> [1] https://github.com/lyft/flinkk8soperator
>>
>> On Thu, Jan 13, 2022 at 12:32 AM Konstantin Knauf <kna...@apache.org>
>> wrote:
>> >
>> > Hi everyone,
>> >
>> > I would like to discuss and understand if the benefits of having Per-Job
>> > Mode in Apache Flink outweigh its drawbacks.
>> >
>> >
>> > *# Background: Flink's Deployment Modes*
>> > Flink currently has three deployment modes. They differ in the following
>> > dimensions:
>> > * main() method executed on Jobmanager or Client
>> > * dependencies shipped by client or bundled with all nodes
>> > * number of jobs per cluster & relationship between job and cluster
>> > lifecycle* (supported resource providers)
>> >
>> > ## Application Mode
>> > * main() method executed on Jobmanager
>> > * dependencies already need to be available on all nodes
>> > * dedicated cluster for all jobs executed from the same main()-method
>> > (Note: applications with more than one job, currently still significant
>> > limitations like missing high-availability). Technically, a session
>> cluster
>> > dedicated to all jobs submitted from the same main() method.
>> > * supported by standalone, native kubernetes, YARN
>> >
>> > ## Session Mode
>> > * main() method executed in client
>> > * dependencies are distributed from and by the client to all nodes
>> > * cluster is shared by multiple jobs submitted from different clients,
>> > independent lifecycle
>> > * supported by standalone, Native Kubernetes, YARN
>> >
>> > ## Per-Job Mode
>> > * main() method executed in client
>> > * dependencies are distributed from and by the client to all nodes
>> > * dedicated cluster for a single job
>> > * supported by YARN only
>> >
>> >
>> > *# Reasons to Keep** There are use cases where you might need the
>> > combination of a single job per cluster, but main() method execution in
>> the
>> > client. This combination is only supported by per-job mode.
>> > * It currently exists. Existing users will need to migrate to either
>> > session or application mode.
>> >
>> >
>> > *# Reasons to Drop** With Per-Job Mode and Application Mode we have two
>> > modes that for most users probably do the same thing. Specifically, for
>> > those users that don't care where the main() method is executed and
>> want to
>> > submit a single job per cluster. Having two ways to do the same thing is
>> > confusing.
>> > * Per-Job Mode is only supported by YARN anyway. If we keep it, we
>> should
>> > work towards support in Kubernetes and Standalone, too, to reduce
>> special
>> > casing.
>> > * Dropping per-job mode would reduce complexity in the code and allow
>> us to
>> > dedicate more resources to the other two deployment modes.
>> > * I believe with session mode and application mode we have to easily
>> > distinguishable and understandable deployment modes that cover Flink's
>> use
>> > cases:
>> >    * session mode: olap-style, interactive jobs/queries, short lived
>> batch
>> > jobs, very small jobs, traditional cluster-centric deployment mode (fits
>> > the "Hadoop world")
>> >    * application mode: long-running streaming jobs, large scale &
>> > heterogenous jobs (resource isolation!), application-centric deployment
>> > mode (fits the "Kubernetes world")
>> >
>> >
>> > *# Call to Action*
>> > * Do you use per-job mode? If so, why & would you be able to migrate to
>> one
>> > of the other methods?
>> > * Am I missing any pros/cons?
>> > * Are you in favor of dropping per-job mode midterm?
>> >
>> > Cheers and thank you,
>> >
>> > Konstantin
>> >
>> > --
>> >
>> > Konstantin Knauf
>> >
>> > https://twitter.com/snntrable
>> >
>> > https://github.com/knaufk
>>
>
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>

Re: [DISCUSS] Future of Per-Job Mode

Reply via email to