Re: [DISCUSS] FLIP-85: Delayed Job Graph Generation

Kostas Kloudas Tue, 03 Mar 2020 02:49:57 -0800

Hi Peter,

I understand your point. This is why I was also a bit torn about the
name and my proposal was a bit aligned with yours (something along the
lines of "cluster deploy" mode).


But many of the other participants in the discussion suggested the
"Application Mode". I think that the reasoning is that now the user's
Application is more self-contained.
It will be submitted to the cluster and the user can just disconnect.
In addition, as discussed briefly in the doc, in the future there may
be better support for multi-execute applications which will bring us
one step closer to the true "Application Mode". But this is how I
interpreted their arguments, of course they can also express their
thoughts on the topic :)

Cheers,
Kostas

On Mon, Mar 2, 2020 at 6:15 PM Peter Huang <[email protected]> wrote:
>
> Hi Kostas,
>
> Thanks for updating the wiki. We have aligned with the implementations in the 
> doc. But I feel it is still a little bit confusing of the naming from a 
> user's perspective. It is well known that Flink support per job cluster and 
> session cluster. The concept is in the layer of how a job is managed within 
> Flink. The method introduced util now is a kind of mixing job and session 
> cluster to promising the implementation complexity. We probably don't need to 
> label it as Application Model as the same layer of per job cluster and 
> session cluster. Conceptually, I think it is still a cluster mode 
> implementation for per job cluster.
>
> To minimize the confusion of users, I think it would be better just an option 
> of per job cluster for each type of cluster manager. How do you think?
>
>
> Best Regards
> Peter Huang
>
>
>
>
>
>
>
>
> On Mon, Mar 2, 2020 at 7:22 AM Kostas Kloudas <[email protected]> wrote:
>>
>> Hi Yang,
>>
>> The difference between per-job and application mode is that, as you
>> described, in the per-job mode the main is executed on the client
>> while in the application mode, the main is executed on the cluster.
>> I do not think we have to offer "application mode" with running the
>> main on the client side as this is exactly what the per-job mode does
>> currently and, as you described also, it would be redundant.
>>
>> Sorry if this was not clear in the document.
>>
>> Cheers,
>> Kostas
>>
>> On Mon, Mar 2, 2020 at 3:17 PM Yang Wang <[email protected]> wrote:
>> >
>> > Hi Kostas,
>> >
>> > Thanks a lot for your conclusion and updating the FLIP-85 WIKI. Currently, 
>> > i have no more
>> > questions about motivation, approach, fault tolerance and the first phase 
>> > implementation.
>> >
>> > I think the new title "Flink Application Mode" makes a lot senses to me. 
>> > Especially for the
>> > containerized environment, the cluster deploy option will be very useful.
>> >
>> > Just one concern, how do we introduce this new application mode to our 
>> > users?
>> > Each user program(i.e. `main()`) is an application. Currently, we intend 
>> > to only support one
>> > `execute()`. So what's the difference between per-job and application mode?
>> >
>> > For per-job, user `main()` is always executed on client side. And For 
>> > application mode, user
>> > `main()` could be executed on client or master side(configured via cli 
>> > option).
>> > Right? We need to have a clear concept. Otherwise, the users will be more 
>> > and more confusing.
>> >
>> >
>> > Best,
>> > Yang
>> >
>> > Kostas Kloudas <[email protected]> 于2020年3月2日周一 下午5:58写道：
>> >>
>> >> Hi all,
>> >>
>> >> I update 
>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Flink+Application+Mode
>> >> based on the discussion we had here:
>> >>
>> >> https://docs.google.com/document/d/1ji72s3FD9DYUyGuKnJoO4ApzV-nSsZa0-bceGXW7Ocw/edit#
>> >>
>> >> Please let me know what you think and please keep the discussion in the 
>> >> ML :)
>> >>
>> >> Thanks for starting the discussion and I hope that soon we will be
>> >> able to vote on the FLIP.
>> >>
>> >> Cheers,
>> >> Kostas
>> >>
>> >> On Thu, Jan 16, 2020 at 3:40 AM Yang Wang <[email protected]> wrote:
>> >> >
>> >> > Hi all,
>> >> >
>> >> > Thanks a lot for the feedback from @Kostas Kloudas. Your all concerns 
>> >> > are
>> >> > on point. The FLIP-85 is mainly
>> >> > focused on supporting cluster mode for per-job. Since it is more urgent 
>> >> > and
>> >> > have much more use
>> >> > cases both in Yarn and Kubernetes deployment. For session cluster, we 
>> >> > could
>> >> > have more discussion
>> >> > in a new thread later.
>> >> >
>> >> > #1, How to download the user jars and dependencies for per-job in 
>> >> > cluster
>> >> > mode?
>> >> > For Yarn, we could register the user jars and dependencies as
>> >> > LocalResource. They will be distributed
>> >> > by Yarn. And once the JobManager and TaskManager launched, the jars are
>> >> > already exists.
>> >> > For Standalone per-job and K8s, we expect that the user jars
>> >> > and dependencies are built into the image.
>> >> > Or the InitContainer could be used for downloading. It is natively
>> >> > distributed and we will not have bottleneck.
>> >> >
>> >> > #2, Job graph recovery
>> >> > We could have an optimization to store job graph on the DFS. However, i
>> >> > suggest building a new jobgraph
>> >> > from the configuration is the default option. Since we will not always 
>> >> > have
>> >> > a DFS store when deploying a
>> >> > Flink per-job cluster. Of course, we assume that using the same
>> >> > configuration(e.g. job_id, user_jar, main_class,
>> >> > main_args, parallelism, savepoint_settings, etc.) will get a same job
>> >> > graph. I think the standalone per-job
>> >> > already has the similar behavior.
>> >> >
>> >> > #3, What happens with jobs that have multiple execute calls?
>> >> > Currently, it is really a problem. Even we use a local client on Flink
>> >> > master side, it will have different behavior with
>> >> > client mode. For client mode, if we execute multiple times, then we will
>> >> > deploy multiple Flink clusters for each execute.
>> >> > I am not pretty sure whether it is reasonable. However, i still think 
>> >> > using
>> >> > the local client is a good choice. We could
>> >> > continue the discussion in a new thread. @Zili Chen 
>> >> > <[email protected]> Do
>> >> > you want to drive this?
>> >> >
>> >> >
>> >> >
>> >> > Best,
>> >> > Yang
>> >> >
>> >> > Peter Huang <[email protected]> 于2020年1月16日周四 上午1:55写道：
>> >> >
>> >> > > Hi Kostas,
>> >> > >
>> >> > > Thanks for this feedback. I can't agree more about the opinion. The
>> >> > > cluster mode should be added
>> >> > > first in per job cluster.
>> >> > >
>> >> > > 1) For job cluster implementation
>> >> > > 1. Job graph recovery from configuration or store as static job graph 
>> >> > > as
>> >> > > session cluster. I think the static one will be better for less 
>> >> > > recovery
>> >> > > time.
>> >> > > Let me update the doc for details.
>> >> > >
>> >> > > 2. For job execute multiple times, I think @Zili Chen
>> >> > > <[email protected]> has proposed the local client solution that can
>> >> > > the run program actually in the cluster entry point. We can put the
>> >> > > implementation in the second stage,
>> >> > > or even a new FLIP for further discussion.
>> >> > >
>> >> > > 2) For session cluster implementation
>> >> > > We can disable the cluster mode for the session cluster in the first
>> >> > > stage. I agree the jar downloading will be a painful thing.
>> >> > > We can consider about PoC and performance evaluation first. If the 
>> >> > > end to
>> >> > > end experience is good enough, then we can consider
>> >> > > proceeding with the solution.
>> >> > >
>> >> > > Looking forward to more opinions from @Yang Wang 
>> >> > > <[email protected]> @Zili
>> >> > > Chen <[email protected]> @Dian Fu <[email protected]>.
>> >> > >
>> >> > >
>> >> > > Best Regards
>> >> > > Peter Huang
>> >> > >
>> >> > > On Wed, Jan 15, 2020 at 7:50 AM Kostas Kloudas <[email protected]> 
>> >> > > wrote:
>> >> > >
>> >> > >> Hi all,
>> >> > >>
>> >> > >> I am writing here as the discussion on the Google Doc seems to be a
>> >> > >> bit difficult to follow.
>> >> > >>
>> >> > >> I think that in order to be able to make progress, it would be 
>> >> > >> helpful
>> >> > >> to focus on per-job mode for now.
>> >> > >> The reason is that:
>> >> > >>  1) making the (unique) JobSubmitHandler responsible for creating the
>> >> > >> jobgraphs,
>> >> > >>   which includes downloading dependencies, is not an optimal solution
>> >> > >>  2) even if we put the responsibility on the JobMaster, currently 
>> >> > >> each
>> >> > >> job has its own
>> >> > >>   JobMaster but they all run on the same process, so we have again a
>> >> > >> single entity.
>> >> > >>
>> >> > >> Of course after this is done, and if we feel comfortable with the
>> >> > >> solution, then we can go to the session mode.
>> >> > >>
>> >> > >> A second comment has to do with fault-tolerance in the per-job,
>> >> > >> cluster-deploy mode.
>> >> > >> In the document, it is suggested that upon recovery, the JobMaster of
>> >> > >> each job re-creates the JobGraph.
>> >> > >> I am just wondering if it is better to create and store the jobGraph
>> >> > >> upon submission and only fetch it
>> >> > >> upon recovery so that we have a static jobGraph.
>> >> > >>
>> >> > >> Finally, I have a question which is what happens with jobs that have
>> >> > >> multiple execute calls?
>> >> > >> The semantics seem to change compared to the current behaviour, 
>> >> > >> right?
>> >> > >>
>> >> > >> Cheers,
>> >> > >> Kostas
>> >> > >>
>> >> > >> On Wed, Jan 8, 2020 at 8:05 PM tison <[email protected]> wrote:
>> >> > >> >
>> >> > >> > not always, Yang Wang is also not yet a committer but he can join 
>> >> > >> > the
>> >> > >> > channel. I cannot find the id by clicking “Add new member in 
>> >> > >> > channel” so
>> >> > >> > come to you and ask for try out the link. Possibly I will find 
>> >> > >> > other
>> >> > >> ways
>> >> > >> > but the original purpose is that the slack channel is a public 
>> >> > >> > area we
>> >> > >> > discuss about developing...
>> >> > >> > Best,
>> >> > >> > tison.
>> >> > >> >
>> >> > >> >
>> >> > >> > Peter Huang <[email protected]> 于2020年1月9日周四 上午2:44写道：
>> >> > >> >
>> >> > >> > > Hi Tison,
>> >> > >> > >
>> >> > >> > > I am not the committer of Flink yet. I think I can't join it 
>> >> > >> > > also.
>> >> > >> > >
>> >> > >> > >
>> >> > >> > > Best Regards
>> >> > >> > > Peter Huang
>> >> > >> > >
>> >> > >> > > On Wed, Jan 8, 2020 at 9:39 AM tison <[email protected]> 
>> >> > >> > > wrote:
>> >> > >> > >
>> >> > >> > > > Hi Peter,
>> >> > >> > > >
>> >> > >> > > > Could you try out this link?
>> >> > >> > > https://the-asf.slack.com/messages/CNA3ADZPH
>> >> > >> > > >
>> >> > >> > > > Best,
>> >> > >> > > > tison.
>> >> > >> > > >
>> >> > >> > > >
>> >> > >> > > > Peter Huang <[email protected]> 于2020年1月9日周四 上午1:22写道：
>> >> > >> > > >
>> >> > >> > > > > Hi Tison,
>> >> > >> > > > >
>> >> > >> > > > > I can't join the group with shared link. Would you please 
>> >> > >> > > > > add me
>> >> > >> into
>> >> > >> > > the
>> >> > >> > > > > group? My slack account is huangzhenqiu0825.
>> >> > >> > > > > Thank you in advance.
>> >> > >> > > > >
>> >> > >> > > > >
>> >> > >> > > > > Best Regards
>> >> > >> > > > > Peter Huang
>> >> > >> > > > >
>> >> > >> > > > > On Wed, Jan 8, 2020 at 12:02 AM tison <[email protected]>
>> >> > >> wrote:
>> >> > >> > > > >
>> >> > >> > > > > > Hi Peter,
>> >> > >> > > > > >
>> >> > >> > > > > > As described above, this effort should get attention from 
>> >> > >> > > > > > people
>> >> > >> > > > > developing
>> >> > >> > > > > > FLIP-73 a.k.a. Executor abstractions. I recommend you to 
>> >> > >> > > > > > join
>> >> > >> the
>> >> > >> > > > public
>> >> > >> > > > > > slack channel[1] for Flink Client API Enhancement and you 
>> >> > >> > > > > > can
>> >> > >> try to
>> >> > >> > > > > share
>> >> > >> > > > > > you detailed thoughts there. It possibly gets more concrete
>> >> > >> > > attentions.
>> >> > >> > > > > >
>> >> > >> > > > > > Best,
>> >> > >> > > > > > tison.
>> >> > >> > > > > >
>> >> > >> > > > > > [1]
>> >> > >> > > > > >
>> >> > >> > > > > >
>> >> > >> > > > >
>> >> > >> > > >
>> >> > >> > >
>> >> > >> https://slack.com/share/IS21SJ75H/Rk8HhUly9FuEHb7oGwBZ33uL/enQtODg2MDYwNjE5MTg3LTA2MjIzNDc1M2ZjZDVlMjdlZjk1M2RkYmJhNjAwMTk2ZDZkODQ4NmY5YmI4OGRhNWJkYTViMTM1NzlmMzc4OWM
>> >> > >> > > > > >
>> >> > >> > > > > >
>> >> > >> > > > > > Peter Huang <[email protected]> 于2020年1月7日周二 
>> >> > >> > > > > > 上午5:09写道：
>> >> > >> > > > > >
>> >> > >> > > > > > > Dear All,
>> >> > >> > > > > > >
>> >> > >> > > > > > > Happy new year! According to existing feedback from the
>> >> > >> community,
>> >> > >> > > we
>> >> > >> > > > > > > revised the doc with the consideration of session cluster
>> >> > >> support,
>> >> > >> > > > and
>> >> > >> > > > > > > concrete interface changes needed and execution plan. 
>> >> > >> > > > > > > Please
>> >> > >> take
>> >> > >> > > one
>> >> > >> > > > > > more
>> >> > >> > > > > > > round of review at your most convenient time.
>> >> > >> > > > > > >
>> >> > >> > > > > > >
>> >> > >> > > > > > >
>> >> > >> > > > > >
>> >> > >> > > > >
>> >> > >> > > >
>> >> > >> > >
>> >> > >> https://docs.google.com/document/d/1aAwVjdZByA-0CHbgv16Me-vjaaDMCfhX7TzVVTuifYM/edit#
>> >> > >> > > > > > >
>> >> > >> > > > > > >
>> >> > >> > > > > > > Best Regards
>> >> > >> > > > > > > Peter Huang
>> >> > >> > > > > > >
>> >> > >> > > > > > >
>> >> > >> > > > > > >
>> >> > >> > > > > > >
>> >> > >> > > > > > >
>> >> > >> > > > > > > On Thu, Jan 2, 2020 at 11:29 AM Peter Huang <
>> >> > >> > > > > [email protected]>
>> >> > >> > > > > > > wrote:
>> >> > >> > > > > > >
>> >> > >> > > > > > > > Hi Dian,
>> >> > >> > > > > > > > Thanks for giving us valuable feedbacks.
>> >> > >> > > > > > > >
>> >> > >> > > > > > > > 1) It's better to have a whole design for this feature
>> >> > >> > > > > > > > For the suggestion of enabling the cluster mode also 
>> >> > >> > > > > > > > session
>> >> > >> > > > > cluster, I
>> >> > >> > > > > > > > think Flink already supported it. 
>> >> > >> > > > > > > > WebSubmissionExtension
>> >> > >> already
>> >> > >> > > > > allows
>> >> > >> > > > > > > > users to start a job with the specified jar by using 
>> >> > >> > > > > > > > web UI.
>> >> > >> > > > > > > > But we need to enable the feature from CLI for both 
>> >> > >> > > > > > > > local
>> >> > >> jar,
>> >> > >> > > > remote
>> >> > >> > > > > > > jar.
>> >> > >> > > > > > > > I will align with Yang Wang first about the details and
>> >> > >> update
>> >> > >> > > the
>> >> > >> > > > > > design
>> >> > >> > > > > > > > doc.
>> >> > >> > > > > > > >
>> >> > >> > > > > > > > 2) It's better to consider the convenience for users, 
>> >> > >> > > > > > > > such
>> >> > >> as
>> >> > >> > > > > debugging
>> >> > >> > > > > > > >
>> >> > >> > > > > > > > I am wondering whether we can store the exception in
>> >> > >> jobgragh
>> >> > >> > > > > > > > generation in application master. As no streaming 
>> >> > >> > > > > > > > graph can
>> >> > >> be
>> >> > >> > > > > > scheduled
>> >> > >> > > > > > > in
>> >> > >> > > > > > > > this case, there will be no more TM will be requested 
>> >> > >> > > > > > > > from
>> >> > >> > > FlinkRM.
>> >> > >> > > > > > > > If the AM is still running, users can still query it 
>> >> > >> > > > > > > > from
>> >> > >> CLI. As
>> >> > >> > > > it
>> >> > >> > > > > > > > requires more change, we can get some feedback from <
>> >> > >> > > > > > [email protected]
>> >> > >> > > > > > > >
>> >> > >> > > > > > > > and @[email protected] <[email protected]>.
>> >> > >> > > > > > > >
>> >> > >> > > > > > > > 3) It's better to consider the impact to the stability 
>> >> > >> > > > > > > > of
>> >> > >> the
>> >> > >> > > > cluster
>> >> > >> > > > > > > >
>> >> > >> > > > > > > > I agree with Yang Wang's opinion.
>> >> > >> > > > > > > >
>> >> > >> > > > > > > >
>> >> > >> > > > > > > >
>> >> > >> > > > > > > > Best Regards
>> >> > >> > > > > > > > Peter Huang
>> >> > >> > > > > > > >
>> >> > >> > > > > > > >
>> >> > >> > > > > > > > On Sun, Dec 29, 2019 at 9:44 PM Dian Fu <
>> >> > >> [email protected]>
>> >> > >> > > > > wrote:
>> >> > >> > > > > > > >
>> >> > >> > > > > > > >> Hi all,
>> >> > >> > > > > > > >>
>> >> > >> > > > > > > >> Sorry to jump into this discussion. Thanks everyone 
>> >> > >> > > > > > > >> for the
>> >> > >> > > > > > discussion.
>> >> > >> > > > > > > >> I'm very interested in this topic although I'm not an
>> >> > >> expert in
>> >> > >> > > > this
>> >> > >> > > > > > > part.
>> >> > >> > > > > > > >> So I'm glad to share my thoughts as following:
>> >> > >> > > > > > > >>
>> >> > >> > > > > > > >> 1) It's better to have a whole design for this feature
>> >> > >> > > > > > > >> As we know, there are two deployment modes: per-job 
>> >> > >> > > > > > > >> mode
>> >> > >> and
>> >> > >> > > > session
>> >> > >> > > > > > > >> mode. I'm wondering which mode really needs this 
>> >> > >> > > > > > > >> feature.
>> >> > >> As the
>> >> > >> > > > > > design
>> >> > >> > > > > > > doc
>> >> > >> > > > > > > >> mentioned, per-job mode is more used for streaming 
>> >> > >> > > > > > > >> jobs and
>> >> > >> > > > session
>> >> > >> > > > > > > mode is
>> >> > >> > > > > > > >> usually used for batch jobs(Of course, the job types 
>> >> > >> > > > > > > >> and
>> >> > >> the
>> >> > >> > > > > > deployment
>> >> > >> > > > > > > >> modes are orthogonal). Usually streaming job is only
>> >> > >> needed to
>> >> > >> > > be
>> >> > >> > > > > > > submitted
>> >> > >> > > > > > > >> once and it will run for days or weeks, while batch 
>> >> > >> > > > > > > >> jobs
>> >> > >> will be
>> >> > >> > > > > > > submitted
>> >> > >> > > > > > > >> more frequently compared with streaming jobs. This 
>> >> > >> > > > > > > >> means
>> >> > >> that
>> >> > >> > > > maybe
>> >> > >> > > > > > > session
>> >> > >> > > > > > > >> mode also needs this feature. However, if we support 
>> >> > >> > > > > > > >> this
>> >> > >> > > feature
>> >> > >> > > > in
>> >> > >> > > > > > > >> session mode, the application master will become the 
>> >> > >> > > > > > > >> new
>> >> > >> > > > centralized
>> >> > >> > > > > > > >> service(which should be solved). So in this case, it's
>> >> > >> better to
>> >> > >> > > > > have
>> >> > >> > > > > > a
>> >> > >> > > > > > > >> complete design for both per-job mode and session 
>> >> > >> > > > > > > >> mode.
>> >> > >> > > > Furthermore,
>> >> > >> > > > > > > even
>> >> > >> > > > > > > >> if we can do it phase by phase, we need to have a 
>> >> > >> > > > > > > >> whole
>> >> > >> picture
>> >> > >> > > of
>> >> > >> > > > > how
>> >> > >> > > > > > > it
>> >> > >> > > > > > > >> works in both per-job mode and session mode.
>> >> > >> > > > > > > >>
>> >> > >> > > > > > > >> 2) It's better to consider the convenience for users, 
>> >> > >> > > > > > > >> such
>> >> > >> as
>> >> > >> > > > > > debugging
>> >> > >> > > > > > > >> After we finish this feature, the job graph will be
>> >> > >> compiled in
>> >> > >> > > > the
>> >> > >> > > > > > > >> application master, which means that users cannot 
>> >> > >> > > > > > > >> easily
>> >> > >> get the
>> >> > >> > > > > > > exception
>> >> > >> > > > > > > >> message synchorousely in the job client if there are
>> >> > >> problems
>> >> > >> > > > during
>> >> > >> > > > > > the
>> >> > >> > > > > > > >> job graph compiling (especially for platform users), 
>> >> > >> > > > > > > >> such
>> >> > >> as the
>> >> > >> > > > > > > resource
>> >> > >> > > > > > > >> path is incorrect, the user program itself has some
>> >> > >> problems,
>> >> > >> > > etc.
>> >> > >> > > > > > What
>> >> > >> > > > > > > I'm
>> >> > >> > > > > > > >> thinking is that maybe we should throw the exceptions 
>> >> > >> > > > > > > >> as
>> >> > >> early
>> >> > >> > > as
>> >> > >> > > > > > > possible
>> >> > >> > > > > > > >> (during job submission stage).
>> >> > >> > > > > > > >>
>> >> > >> > > > > > > >> 3) It's better to consider the impact to the 
>> >> > >> > > > > > > >> stability of
>> >> > >> the
>> >> > >> > > > > cluster
>> >> > >> > > > > > > >> If we perform the compiling in the application 
>> >> > >> > > > > > > >> master, we
>> >> > >> should
>> >> > >> > > > > > > consider
>> >> > >> > > > > > > >> the impact of the compiling errors. Although YARN 
>> >> > >> > > > > > > >> could
>> >> > >> resume
>> >> > >> > > the
>> >> > >> > > > > > > >> application master in case of failures, but in some 
>> >> > >> > > > > > > >> case
>> >> > >> the
>> >> > >> > > > > compiling
>> >> > >> > > > > > > >> failure may be a waste of cluster resource and may 
>> >> > >> > > > > > > >> impact
>> >> > >> the
>> >> > >> > > > > > stability
>> >> > >> > > > > > > the
>> >> > >> > > > > > > >> cluster and the other jobs in the cluster, such as the
>> >> > >> resource
>> >> > >> > > > path
>> >> > >> > > > > > is
>> >> > >> > > > > > > >> incorrect, the user program itself has some 
>> >> > >> > > > > > > >> problems(in
>> >> > >> this
>> >> > >> > > case,
>> >> > >> > > > > job
>> >> > >> > > > > > > >> failover cannot solve this kind of problems) etc. In 
>> >> > >> > > > > > > >> the
>> >> > >> current
>> >> > >> > > > > > > >> implemention, the compiling errors are handled in the
>> >> > >> client
>> >> > >> > > side
>> >> > >> > > > > and
>> >> > >> > > > > > > there
>> >> > >> > > > > > > >> is no impact to the cluster at all.
>> >> > >> > > > > > > >>
>> >> > >> > > > > > > >> Regarding to 1), it's clearly pointed in the design 
>> >> > >> > > > > > > >> doc
>> >> > >> that
>> >> > >> > > only
>> >> > >> > > > > > > per-job
>> >> > >> > > > > > > >> mode will be supported. However, I think it's better 
>> >> > >> > > > > > > >> to
>> >> > >> also
>> >> > >> > > > > consider
>> >> > >> > > > > > > the
>> >> > >> > > > > > > >> session mode in the design doc.
>> >> > >> > > > > > > >> Regarding to 2) and 3), I have not seen related 
>> >> > >> > > > > > > >> sections
>> >> > >> in the
>> >> > >> > > > > design
>> >> > >> > > > > > > >> doc. It will be good if we can cover them in the 
>> >> > >> > > > > > > >> design
>> >> > >> doc.
>> >> > >> > > > > > > >>
>> >> > >> > > > > > > >> Feel free to correct me If there is anything I
>> >> > >> misunderstand.
>> >> > >> > > > > > > >>
>> >> > >> > > > > > > >> Regards,
>> >> > >> > > > > > > >> Dian
>> >> > >> > > > > > > >>
>> >> > >> > > > > > > >>
>> >> > >> > > > > > > >> > 在 2019年12月27日，上午3:13，Peter Huang <
>> >> > >> [email protected]>
>> >> > >> > > > 写道：
>> >> > >> > > > > > > >> >
>> >> > >> > > > > > > >> > Hi Yang,
>> >> > >> > > > > > > >> >
>> >> > >> > > > > > > >> > I can't agree more. The effort definitely needs to 
>> >> > >> > > > > > > >> > align
>> >> > >> with
>> >> > >> > > > the
>> >> > >> > > > > > > final
>> >> > >> > > > > > > >> > goal of FLIP-73.
>> >> > >> > > > > > > >> > I am thinking about whether we can achieve the goal 
>> >> > >> > > > > > > >> > with
>> >> > >> two
>> >> > >> > > > > phases.
>> >> > >> > > > > > > >> >
>> >> > >> > > > > > > >> > 1) Phase I
>> >> > >> > > > > > > >> > As the CLiFrontend will not be depreciated soon. We 
>> >> > >> > > > > > > >> > can
>> >> > >> still
>> >> > >> > > > use
>> >> > >> > > > > > the
>> >> > >> > > > > > > >> > deployMode flag there,
>> >> > >> > > > > > > >> > pass the program info through Flink configuration,  
>> >> > >> > > > > > > >> > use
>> >> > >> the
>> >> > >> > > > > > > >> > ClassPathJobGraphRetriever
>> >> > >> > > > > > > >> > to generate the job graph in ClusterEntrypoints of 
>> >> > >> > > > > > > >> > yarn
>> >> > >> and
>> >> > >> > > > > > > Kubernetes.
>> >> > >> > > > > > > >> >
>> >> > >> > > > > > > >> > 2) Phase II
>> >> > >> > > > > > > >> > In  AbstractJobClusterExecutor, the job graph is
>> >> > >> generated in
>> >> > >> > > > the
>> >> > >> > > > > > > >> execute
>> >> > >> > > > > > > >> > function. We can still
>> >> > >> > > > > > > >> > use the deployMode in it. With deployMode = 
>> >> > >> > > > > > > >> > cluster, the
>> >> > >> > > execute
>> >> > >> > > > > > > >> function
>> >> > >> > > > > > > >> > only starts the cluster.
>> >> > >> > > > > > > >> >
>> >> > >> > > > > > > >> > When {Yarn/Kuberneates}PerJobClusterEntrypoint 
>> >> > >> > > > > > > >> > starts,
>> >> > >> It will
>> >> > >> > > > > start
>> >> > >> > > > > > > the
>> >> > >> > > > > > > >> > dispatch first, then we can use
>> >> > >> > > > > > > >> > a ClusterEnvironment similar to ContextEnvironment 
>> >> > >> > > > > > > >> > to
>> >> > >> submit
>> >> > >> > > the
>> >> > >> > > > > job
>> >> > >> > > > > > > >> with
>> >> > >> > > > > > > >> > jobName the local
>> >> > >> > > > > > > >> > dispatcher. For the details, we need more 
>> >> > >> > > > > > > >> > investigation.
>> >> > >> Let's
>> >> > >> > > > > wait
>> >> > >> > > > > > > >> > for @Aljoscha
>> >> > >> > > > > > > >> > Krettek <[email protected]> @Till Rohrmann <
>> >> > >> > > > > [email protected]
>> >> > >> > > > > > >'s
>> >> > >> > > > > > > >> > feedback after the holiday season.
>> >> > >> > > > > > > >> >
>> >> > >> > > > > > > >> > Thank you in advance. Merry Chrismas and Happy New
>> >> > >> Year!!!
>> >> > >> > > > > > > >> >
>> >> > >> > > > > > > >> >
>> >> > >> > > > > > > >> > Best Regards
>> >> > >> > > > > > > >> > Peter Huang
>> >> > >> > > > > > > >> >
>> >> > >> > > > > > > >> >
>> >> > >> > > > > > > >> >
>> >> > >> > > > > > > >> >
>> >> > >> > > > > > > >> >
>> >> > >> > > > > > > >> >
>> >> > >> > > > > > > >> >
>> >> > >> > > > > > > >> >
>> >> > >> > > > > > > >> > On Wed, Dec 25, 2019 at 1:08 AM Yang Wang <
>> >> > >> > > > [email protected]>
>> >> > >> > > > > > > >> wrote:
>> >> > >> > > > > > > >> >
>> >> > >> > > > > > > >> >> Hi Peter,
>> >> > >> > > > > > > >> >>
>> >> > >> > > > > > > >> >> I think we need to reconsider tison's suggestion
>> >> > >> seriously.
>> >> > >> > > > After
>> >> > >> > > > > > > >> FLIP-73,
>> >> > >> > > > > > > >> >> the deployJobCluster has
>> >> > >> > > > > > > >> >> beenmoved into `JobClusterExecutor#execute`. It 
>> >> > >> > > > > > > >> >> should
>> >> > >> not be
>> >> > >> > > > > > > perceived
>> >> > >> > > > > > > >> >> for `CliFrontend`. That
>> >> > >> > > > > > > >> >> means the user program will *ALWAYS* be executed on
>> >> > >> client
>> >> > >> > > > side.
>> >> > >> > > > > > This
>> >> > >> > > > > > > >> is
>> >> > >> > > > > > > >> >> the by design behavior.
>> >> > >> > > > > > > >> >> So, we could not just add `if(client mode) .. else
>> >> > >> if(cluster
>> >> > >> > > > > mode)
>> >> > >> > > > > > > >> ...`
>> >> > >> > > > > > > >> >> codes in `CliFrontend` to bypass
>> >> > >> > > > > > > >> >> the executor. We need to find a clean way to 
>> >> > >> > > > > > > >> >> decouple
>> >> > >> > > executing
>> >> > >> > > > > > user
>> >> > >> > > > > > > >> >> program and deploying per-job
>> >> > >> > > > > > > >> >> cluster. Based on this, we could support to 
>> >> > >> > > > > > > >> >> execute user
>> >> > >> > > > program
>> >> > >> > > > > on
>> >> > >> > > > > > > >> client
>> >> > >> > > > > > > >> >> or master side.
>> >> > >> > > > > > > >> >>
>> >> > >> > > > > > > >> >> Maybe Aljoscha and Jeff could give some good
>> >> > >> suggestions.
>> >> > >> > > > > > > >> >>
>> >> > >> > > > > > > >> >>
>> >> > >> > > > > > > >> >>
>> >> > >> > > > > > > >> >> Best,
>> >> > >> > > > > > > >> >> Yang
>> >> > >> > > > > > > >> >>
>> >> > >> > > > > > > >> >> Peter Huang <[email protected]> 
>> >> > >> > > > > > > >> >> 于2019年12月25日周三
>> >> > >> > > > > 上午4:03写道：
>> >> > >> > > > > > > >> >>
>> >> > >> > > > > > > >> >>> Hi Jingjing,
>> >> > >> > > > > > > >> >>>
>> >> > >> > > > > > > >> >>> The improvement proposed is a deployment option 
>> >> > >> > > > > > > >> >>> for
>> >> > >> CLI. For
>> >> > >> > > > SQL
>> >> > >> > > > > > > based
>> >> > >> > > > > > > >> >>> Flink application, It is more convenient to use 
>> >> > >> > > > > > > >> >>> the
>> >> > >> existing
>> >> > >> > > > > model
>> >> > >> > > > > > > in
>> >> > >> > > > > > > >> >>> SqlClient in which
>> >> > >> > > > > > > >> >>> the job graph is generated within SqlClient. After
>> >> > >> adding
>> >> > >> > > the
>> >> > >> > > > > > > delayed
>> >> > >> > > > > > > >> job
>> >> > >> > > > > > > >> >>> graph generation, I think there is no change is 
>> >> > >> > > > > > > >> >>> needed
>> >> > >> for
>> >> > >> > > > your
>> >> > >> > > > > > > side.
>> >> > >> > > > > > > >> >>>
>> >> > >> > > > > > > >> >>>
>> >> > >> > > > > > > >> >>> Best Regards
>> >> > >> > > > > > > >> >>> Peter Huang
>> >> > >> > > > > > > >> >>>
>> >> > >> > > > > > > >> >>>
>> >> > >> > > > > > > >> >>> On Wed, Dec 18, 2019 at 6:01 AM jingjing bai <
>> >> > >> > > > > > > >> [email protected]>
>> >> > >> > > > > > > >> >>> wrote:
>> >> > >> > > > > > > >> >>>
>> >> > >> > > > > > > >> >>>> hi peter:
>> >> > >> > > > > > > >> >>>>    we had extension SqlClent to support sql job
>> >> > >> submit in
>> >> > >> > > web
>> >> > >> > > > > > base
>> >> > >> > > > > > > on
>> >> > >> > > > > > > >> >>>> flink 1.9.   we support submit to yarn on per job
>> >> > >> mode too.
>> >> > >> > > > > > > >> >>>>    in this case, the job graph generated  on 
>> >> > >> > > > > > > >> >>>> client
>> >> > >> side
>> >> > >> > > .  I
>> >> > >> > > > > > think
>> >> > >> > > > > > > >> >>> this
>> >> > >> > > > > > > >> >>>> discuss Mainly to improve api programme.  but in 
>> >> > >> > > > > > > >> >>>> my
>> >> > >> case ,
>> >> > >> > > > > there
>> >> > >> > > > > > is
>> >> > >> > > > > > > >> no
>> >> > >> > > > > > > >> >>>> jar to upload but only a sql string .
>> >> > >> > > > > > > >> >>>>    do u had more suggestion to improve for sql 
>> >> > >> > > > > > > >> >>>> mode
>> >> > >> or it
>> >> > >> > > is
>> >> > >> > > > > > only a
>> >> > >> > > > > > > >> >>>> switch for api programme？
>> >> > >> > > > > > > >> >>>>
>> >> > >> > > > > > > >> >>>>
>> >> > >> > > > > > > >> >>>> best
>> >> > >> > > > > > > >> >>>> bai jj
>> >> > >> > > > > > > >> >>>>
>> >> > >> > > > > > > >> >>>>
>> >> > >> > > > > > > >> >>>> Yang Wang <[email protected]> 于2019年12月18日周三
>> >> > >> 下午7:21写道：
>> >> > >> > > > > > > >> >>>>
>> >> > >> > > > > > > >> >>>>> I just want to revive this discussion.
>> >> > >> > > > > > > >> >>>>>
>> >> > >> > > > > > > >> >>>>> Recently, i am thinking about how to natively 
>> >> > >> > > > > > > >> >>>>> run
>> >> > >> flink
>> >> > >> > > > > per-job
>> >> > >> > > > > > > >> >>> cluster on
>> >> > >> > > > > > > >> >>>>> Kubernetes.
>> >> > >> > > > > > > >> >>>>> The per-job mode on Kubernetes is very different
>> >> > >> from on
>> >> > >> > > > Yarn.
>> >> > >> > > > > > And
>> >> > >> > > > > > > >> we
>> >> > >> > > > > > > >> >>> will
>> >> > >> > > > > > > >> >>>>> have
>> >> > >> > > > > > > >> >>>>> the same deployment requirements to the client 
>> >> > >> > > > > > > >> >>>>> and
>> >> > >> entry
>> >> > >> > > > > point.
>> >> > >> > > > > > > >> >>>>>
>> >> > >> > > > > > > >> >>>>> 1. Flink client not always need a local jar to 
>> >> > >> > > > > > > >> >>>>> start
>> >> > >> a
>> >> > >> > > Flink
>> >> > >> > > > > > > per-job
>> >> > >> > > > > > > >> >>>>> cluster. We could
>> >> > >> > > > > > > >> >>>>> support multiple schemas. For example,
>> >> > >> > > > file:///path/of/my.jar
>> >> > >> > > > > > > means
>> >> > >> > > > > > > >> a
>> >> > >> > > > > > > >> >>> jar
>> >> > >> > > > > > > >> >>>>> located
>> >> > >> > > > > > > >> >>>>> at client side,
>> >> > >> hdfs://myhdfs/user/myname/flink/my.jar
>> >> > >> > > > means a
>> >> > >> > > > > > jar
>> >> > >> > > > > > > >> >>> located
>> >> > >> > > > > > > >> >>>>> at
>> >> > >> > > > > > > >> >>>>> remote hdfs, local:///path/in/image/my.jar 
>> >> > >> > > > > > > >> >>>>> means a
>> >> > >> jar
>> >> > >> > > > located
>> >> > >> > > > > > at
>> >> > >> > > > > > > >> >>>>> jobmanager side.
>> >> > >> > > > > > > >> >>>>>
>> >> > >> > > > > > > >> >>>>> 2. Support running user program on master side. 
>> >> > >> > > > > > > >> >>>>> This
>> >> > >> also
>> >> > >> > > > > means
>> >> > >> > > > > > > the
>> >> > >> > > > > > > >> >>> entry
>> >> > >> > > > > > > >> >>>>> point
>> >> > >> > > > > > > >> >>>>> will generate the job graph on master side. We 
>> >> > >> > > > > > > >> >>>>> could
>> >> > >> use
>> >> > >> > > the
>> >> > >> > > > > > > >> >>>>> ClasspathJobGraphRetriever
>> >> > >> > > > > > > >> >>>>> or start a local Flink client to achieve this
>> >> > >> purpose.
>> >> > >> > > > > > > >> >>>>>
>> >> > >> > > > > > > >> >>>>>
>> >> > >> > > > > > > >> >>>>> cc tison, Aljoscha & Kostas Do you think this 
>> >> > >> > > > > > > >> >>>>> is the
>> >> > >> right
>> >> > >> > > > > > > >> direction we
>> >> > >> > > > > > > >> >>>>> need to work?
>> >> > >> > > > > > > >> >>>>>
>> >> > >> > > > > > > >> >>>>> tison <[email protected]> 于2019年12月12日周四
>> >> > >> 下午4:48写道：
>> >> > >> > > > > > > >> >>>>>
>> >> > >> > > > > > > >> >>>>>> A quick idea is that we separate the deployment
>> >> > >> from user
>> >> > >> > > > > > program
>> >> > >> > > > > > > >> >>> that
>> >> > >> > > > > > > >> >>>>> it
>> >> > >> > > > > > > >> >>>>>> has always been done
>> >> > >> > > > > > > >> >>>>>> outside the program. On user program executed 
>> >> > >> > > > > > > >> >>>>>> there
>> >> > >> is
>> >> > >> > > > > always a
>> >> > >> > > > > > > >> >>>>>> ClusterClient that communicates with
>> >> > >> > > > > > > >> >>>>>> an existing cluster, remote or local. It will 
>> >> > >> > > > > > > >> >>>>>> be
>> >> > >> another
>> >> > >> > > > > thread
>> >> > >> > > > > > > so
>> >> > >> > > > > > > >> >>> just
>> >> > >> > > > > > > >> >>>>> for
>> >> > >> > > > > > > >> >>>>>> your information.
>> >> > >> > > > > > > >> >>>>>>
>> >> > >> > > > > > > >> >>>>>> Best,
>> >> > >> > > > > > > >> >>>>>> tison.
>> >> > >> > > > > > > >> >>>>>>
>> >> > >> > > > > > > >> >>>>>>
>> >> > >> > > > > > > >> >>>>>> tison <[email protected]> 于2019年12月12日周四
>> >> > >> 下午4:40写道：
>> >> > >> > > > > > > >> >>>>>>
>> >> > >> > > > > > > >> >>>>>>> Hi Peter,
>> >> > >> > > > > > > >> >>>>>>>
>> >> > >> > > > > > > >> >>>>>>> Another concern I realized recently is that 
>> >> > >> > > > > > > >> >>>>>>> with
>> >> > >> current
>> >> > >> > > > > > > Executors
>> >> > >> > > > > > > >> >>>>>>> abstraction(FLIP-73)
>> >> > >> > > > > > > >> >>>>>>> I'm afraid that user program is designed to 
>> >> > >> > > > > > > >> >>>>>>> ALWAYS
>> >> > >> run
>> >> > >> > > on
>> >> > >> > > > > the
>> >> > >> > > > > > > >> >>> client
>> >> > >> > > > > > > >> >>>>>> side.
>> >> > >> > > > > > > >> >>>>>>> Specifically,
>> >> > >> > > > > > > >> >>>>>>> we deploy the job in executor when env.execute
>> >> > >> called.
>> >> > >> > > > This
>> >> > >> > > > > > > >> >>>>> abstraction
>> >> > >> > > > > > > >> >>>>>>> possibly prevents
>> >> > >> > > > > > > >> >>>>>>> Flink runs user program on the cluster side.
>> >> > >> > > > > > > >> >>>>>>>
>> >> > >> > > > > > > >> >>>>>>> For your proposal, in this case we already
>> >> > >> compiled the
>> >> > >> > > > > > program
>> >> > >> > > > > > > >> and
>> >> > >> > > > > > > >> >>>>> run
>> >> > >> > > > > > > >> >>>>>> on
>> >> > >> > > > > > > >> >>>>>>> the client side,
>> >> > >> > > > > > > >> >>>>>>> even we deploy a cluster and retrieve job 
>> >> > >> > > > > > > >> >>>>>>> graph
>> >> > >> from
>> >> > >> > > > program
>> >> > >> > > > > > > >> >>>>> metadata, it
>> >> > >> > > > > > > >> >>>>>>> doesn't make
>> >> > >> > > > > > > >> >>>>>>> many sense.
>> >> > >> > > > > > > >> >>>>>>>
>> >> > >> > > > > > > >> >>>>>>> cc Aljoscha & Kostas what do you think about 
>> >> > >> > > > > > > >> >>>>>>> this
>> >> > >> > > > > constraint?
>> >> > >> > > > > > > >> >>>>>>>
>> >> > >> > > > > > > >> >>>>>>> Best,
>> >> > >> > > > > > > >> >>>>>>> tison.
>> >> > >> > > > > > > >> >>>>>>>
>> >> > >> > > > > > > >> >>>>>>>
>> >> > >> > > > > > > >> >>>>>>> Peter Huang <[email protected]>
>> >> > >> 于2019年12月10日周二
>> >> > >> > > > > > > >> 下午12:45写道：
>> >> > >> > > > > > > >> >>>>>>>
>> >> > >> > > > > > > >> >>>>>>>> Hi Tison,
>> >> > >> > > > > > > >> >>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>> Yes, you are right. I think I made the wrong
>> >> > >> argument
>> >> > >> > > in
>> >> > >> > > > > the
>> >> > >> > > > > > > doc.
>> >> > >> > > > > > > >> >>>>>>>> Basically, the packaging jar problem is only 
>> >> > >> > > > > > > >> >>>>>>>> for
>> >> > >> > > platform
>> >> > >> > > > > > > users.
>> >> > >> > > > > > > >> >>> In
>> >> > >> > > > > > > >> >>>>> our
>> >> > >> > > > > > > >> >>>>>>>> internal deploy service,
>> >> > >> > > > > > > >> >>>>>>>> we further optimized the deployment latency 
>> >> > >> > > > > > > >> >>>>>>>> by
>> >> > >> letting
>> >> > >> > > > > users
>> >> > >> > > > > > to
>> >> > >> > > > > > > >> >>>>>> packaging
>> >> > >> > > > > > > >> >>>>>>>> flink-runtime together with the uber jar, so 
>> >> > >> > > > > > > >> >>>>>>>> that
>> >> > >> we
>> >> > >> > > > don't
>> >> > >> > > > > > need
>> >> > >> > > > > > > >> to
>> >> > >> > > > > > > >> >>>>>>>> consider
>> >> > >> > > > > > > >> >>>>>>>> multiple flink version
>> >> > >> > > > > > > >> >>>>>>>> support for now. In the session client mode, 
>> >> > >> > > > > > > >> >>>>>>>> as
>> >> > >> Flink
>> >> > >> > > > libs
>> >> > >> > > > > > will
>> >> > >> > > > > > > >> be
>> >> > >> > > > > > > >> >>>>>> shipped
>> >> > >> > > > > > > >> >>>>>>>> anyway as local resources of yarn. Users 
>> >> > >> > > > > > > >> >>>>>>>> actually
>> >> > >> don't
>> >> > >> > > > > need
>> >> > >> > > > > > to
>> >> > >> > > > > > > >> >>>>> package
>> >> > >> > > > > > > >> >>>>>>>> those libs into job jar.
>> >> > >> > > > > > > >> >>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>> Best Regards
>> >> > >> > > > > > > >> >>>>>>>> Peter Huang
>> >> > >> > > > > > > >> >>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>> On Mon, Dec 9, 2019 at 8:35 PM tison <
>> >> > >> > > > [email protected]
>> >> > >> > > > > >
>> >> > >> > > > > > > >> >>> wrote:
>> >> > >> > > > > > > >> >>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>> 3. What do you mean about the package? Do 
>> >> > >> > > > > > > >> >>>>>>>>>> users
>> >> > >> need
>> >> > >> > > to
>> >> > >> > > > > > > >> >>> compile
>> >> > >> > > > > > > >> >>>>>> their
>> >> > >> > > > > > > >> >>>>>>>>> jars
>> >> > >> > > > > > > >> >>>>>>>>> inlcuding flink-clients, flink-optimizer,
>> >> > >> flink-table
>> >> > >> > > > > codes?
>> >> > >> > > > > > > >> >>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>> The answer should be no because they exist 
>> >> > >> > > > > > > >> >>>>>>>>> in
>> >> > >> system
>> >> > >> > > > > > > classpath.
>> >> > >> > > > > > > >> >>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>> Best,
>> >> > >> > > > > > > >> >>>>>>>>> tison.
>> >> > >> > > > > > > >> >>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>> Yang Wang <[email protected]> 
>> >> > >> > > > > > > >> >>>>>>>>> 于2019年12月10日周二
>> >> > >> > > > > 下午12:18写道：
>> >> > >> > > > > > > >> >>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>> Hi Peter,
>> >> > >> > > > > > > >> >>>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>> Thanks a lot for starting this discussion. 
>> >> > >> > > > > > > >> >>>>>>>>>> I
>> >> > >> think
>> >> > >> > > this
>> >> > >> > > > > is
>> >> > >> > > > > > a
>> >> > >> > > > > > > >> >>> very
>> >> > >> > > > > > > >> >>>>>>>> useful
>> >> > >> > > > > > > >> >>>>>>>>>> feature.
>> >> > >> > > > > > > >> >>>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>> Not only for Yarn, i am focused on flink on
>> >> > >> > > Kubernetes
>> >> > >> > > > > > > >> >>>>> integration
>> >> > >> > > > > > > >> >>>>>> and
>> >> > >> > > > > > > >> >>>>>>>>> come
>> >> > >> > > > > > > >> >>>>>>>>>> across the same
>> >> > >> > > > > > > >> >>>>>>>>>> problem. I do not want the job graph 
>> >> > >> > > > > > > >> >>>>>>>>>> generated
>> >> > >> on
>> >> > >> > > > client
>> >> > >> > > > > > > side.
>> >> > >> > > > > > > >> >>>>>>>> Instead,
>> >> > >> > > > > > > >> >>>>>>>>> the
>> >> > >> > > > > > > >> >>>>>>>>>> user jars are built in
>> >> > >> > > > > > > >> >>>>>>>>>> a user-defined image. When the job manager
>> >> > >> launched,
>> >> > >> > > we
>> >> > >> > > > > > just
>> >> > >> > > > > > > >> >>>>> need to
>> >> > >> > > > > > > >> >>>>>>>>>> generate the job graph
>> >> > >> > > > > > > >> >>>>>>>>>> based on local user jars.
>> >> > >> > > > > > > >> >>>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>> I have some small suggestion about this.
>> >> > >> > > > > > > >> >>>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>> 1. `ProgramJobGraphRetriever` is very 
>> >> > >> > > > > > > >> >>>>>>>>>> similar to
>> >> > >> > > > > > > >> >>>>>>>>>> `ClasspathJobGraphRetriever`, the 
>> >> > >> > > > > > > >> >>>>>>>>>> differences
>> >> > >> > > > > > > >> >>>>>>>>>> are the former needs `ProgramMetadata` and 
>> >> > >> > > > > > > >> >>>>>>>>>> the
>> >> > >> latter
>> >> > >> > > > > needs
>> >> > >> > > > > > > >> >>> some
>> >> > >> > > > > > > >> >>>>>>>>> arguments.
>> >> > >> > > > > > > >> >>>>>>>>>> Is it possible to
>> >> > >> > > > > > > >> >>>>>>>>>> have an unified `JobGraphRetriever` to 
>> >> > >> > > > > > > >> >>>>>>>>>> support
>> >> > >> both?
>> >> > >> > > > > > > >> >>>>>>>>>> 2. Is it possible to not use a local user 
>> >> > >> > > > > > > >> >>>>>>>>>> jar to
>> >> > >> > > start
>> >> > >> > > > a
>> >> > >> > > > > > > >> >>> per-job
>> >> > >> > > > > > > >> >>>>>>>> cluster?
>> >> > >> > > > > > > >> >>>>>>>>>> In your case, the user jars has
>> >> > >> > > > > > > >> >>>>>>>>>> existed on hdfs already and we do need to
>> >> > >> download
>> >> > >> > > the
>> >> > >> > > > > jars
>> >> > >> > > > > > > to
>> >> > >> > > > > > > >> >>>>>>>> deployer
>> >> > >> > > > > > > >> >>>>>>>>>> service. Currently, we
>> >> > >> > > > > > > >> >>>>>>>>>> always need a local user jar to start a 
>> >> > >> > > > > > > >> >>>>>>>>>> flink
>> >> > >> > > cluster.
>> >> > >> > > > It
>> >> > >> > > > > > is
>> >> > >> > > > > > > >> >>> be
>> >> > >> > > > > > > >> >>>>>> great
>> >> > >> > > > > > > >> >>>>>>>> if
>> >> > >> > > > > > > >> >>>>>>>>> we
>> >> > >> > > > > > > >> >>>>>>>>>> could support remote user jars.
>> >> > >> > > > > > > >> >>>>>>>>>>>> In the implementation, we assume users 
>> >> > >> > > > > > > >> >>>>>>>>>>>> package
>> >> > >> > > > > > > >> >>> flink-clients,
>> >> > >> > > > > > > >> >>>>>>>>>> flink-optimizer, flink-table together 
>> >> > >> > > > > > > >> >>>>>>>>>> within
>> >> > >> the job
>> >> > >> > > > jar.
>> >> > >> > > > > > > >> >>>>> Otherwise,
>> >> > >> > > > > > > >> >>>>>>>> the
>> >> > >> > > > > > > >> >>>>>>>>>> job graph generation within
>> >> > >> JobClusterEntryPoint will
>> >> > >> > > > > fail.
>> >> > >> > > > > > > >> >>>>>>>>>> 3. What do you mean about the package? Do 
>> >> > >> > > > > > > >> >>>>>>>>>> users
>> >> > >> need
>> >> > >> > > to
>> >> > >> > > > > > > >> >>> compile
>> >> > >> > > > > > > >> >>>>>> their
>> >> > >> > > > > > > >> >>>>>>>>> jars
>> >> > >> > > > > > > >> >>>>>>>>>> inlcuding flink-clients, flink-optimizer,
>> >> > >> flink-table
>> >> > >> > > > > > codes?
>> >> > >> > > > > > > >> >>>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>> Best,
>> >> > >> > > > > > > >> >>>>>>>>>> Yang
>> >> > >> > > > > > > >> >>>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>> Peter Huang <[email protected]>
>> >> > >> > > > 于2019年12月10日周二
>> >> > >> > > > > > > >> >>>>> 上午2:37写道：
>> >> > >> > > > > > > >> >>>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>>> Dear All,
>> >> > >> > > > > > > >> >>>>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>>> Recently, the Flink community starts to
>> >> > >> improve the
>> >> > >> > > > yarn
>> >> > >> > > > > > > >> >>>>> cluster
>> >> > >> > > > > > > >> >>>>>>>>>> descriptor
>> >> > >> > > > > > > >> >>>>>>>>>>> to make job jar and config files 
>> >> > >> > > > > > > >> >>>>>>>>>>> configurable
>> >> > >> from
>> >> > >> > > > CLI.
>> >> > >> > > > > It
>> >> > >> > > > > > > >> >>>>>> improves
>> >> > >> > > > > > > >> >>>>>>>> the
>> >> > >> > > > > > > >> >>>>>>>>>>> flexibility of  Flink deployment Yarn Per 
>> >> > >> > > > > > > >> >>>>>>>>>>> Job
>> >> > >> Mode.
>> >> > >> > > > For
>> >> > >> > > > > > > >> >>>>> platform
>> >> > >> > > > > > > >> >>>>>>>> users
>> >> > >> > > > > > > >> >>>>>>>>>> who
>> >> > >> > > > > > > >> >>>>>>>>>>> manage tens of hundreds of streaming 
>> >> > >> > > > > > > >> >>>>>>>>>>> pipelines
>> >> > >> for
>> >> > >> > > the
>> >> > >> > > > > > whole
>> >> > >> > > > > > > >> >>>>> org
>> >> > >> > > > > > > >> >>>>>> or
>> >> > >> > > > > > > >> >>>>>>>>>>> company, we found the job graph 
>> >> > >> > > > > > > >> >>>>>>>>>>> generation in
>> >> > >> > > > > client-side
>> >> > >> > > > > > is
>> >> > >> > > > > > > >> >>>>>> another
>> >> > >> > > > > > > >> >>>>>>>>>>> pinpoint. Thus, we want to propose a
>> >> > >> configurable
>> >> > >> > > > > feature
>> >> > >> > > > > > > >> >>> for
>> >> > >> > > > > > > >> >>>>>>>>>>> FlinkYarnSessionCli. The feature can allow
>> >> > >> users to
>> >> > >> > > > > choose
>> >> > >> > > > > > > >> >>> the
>> >> > >> > > > > > > >> >>>>> job
>> >> > >> > > > > > > >> >>>>>>>>> graph
>> >> > >> > > > > > > >> >>>>>>>>>>> generation in Flink ClusterEntryPoint so 
>> >> > >> > > > > > > >> >>>>>>>>>>> that
>> >> > >> the
>> >> > >> > > job
>> >> > >> > > > > jar
>> >> > >> > > > > > > >> >>>>> doesn't
>> >> > >> > > > > > > >> >>>>>>>> need
>> >> > >> > > > > > > >> >>>>>>>>> to
>> >> > >> > > > > > > >> >>>>>>>>>>> be locally for the job graph generation. 
>> >> > >> > > > > > > >> >>>>>>>>>>> The
>> >> > >> > > proposal
>> >> > >> > > > is
>> >> > >> > > > > > > >> >>>>> organized
>> >> > >> > > > > > > >> >>>>>>>> as a
>> >> > >> > > > > > > >> >>>>>>>>>>> FLIP
>> >> > >> > > > > > > >> >>>>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>
>> >> > >> > > > > > > >> >>>>>>
>> >> > >> > > > > > > >> >>>>>
>> >> > >> > > > > > > >> >>>
>> >> > >> > > > > > > >>
>> >> > >> > > > > > >
>> >> > >> > > > > >
>> >> > >> > > > >
>> >> > >> > > >
>> >> > >> > >
>> >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Delayed+JobGraph+Generation
>> >> > >> > > > > > > >> >>>>>>>>>>> .
>> >> > >> > > > > > > >> >>>>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>>> Any questions and suggestions are 
>> >> > >> > > > > > > >> >>>>>>>>>>> welcomed.
>> >> > >> Thank
>> >> > >> > > you
>> >> > >> > > > in
>> >> > >> > > > > > > >> >>>>> advance.
>> >> > >> > > > > > > >> >>>>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>>> Best Regards
>> >> > >> > > > > > > >> >>>>>>>>>>> Peter Huang
>> >> > >> > > > > > > >> >>>>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>>
>> >> > >> > > > > > > >> >>>>>>>
>> >> > >> > > > > > > >> >>>>>>
>> >> > >> > > > > > > >> >>>>>
>> >> > >> > > > > > > >> >>>>
>> >> > >> > > > > > > >> >>>
>> >> > >> > > > > > > >> >>
>> >> > >> > > > > > > >>
>> >> > >> > > > > > > >>
>> >> > >> > > > > > >
>> >> > >> > > > > >
>> >> > >> > > > >
>> >> > >> > > >
>> >> > >> > >
>> >> > >>
>> >> > >
>> >>

Re: [DISCUSS] FLIP-85: Delayed Job Graph Generation

Reply via email to