Re: [DISCUSS] FLIP-74: Flink JobClient API

Zili Chen Thu, 26 Sep 2019 23:14:29 -0700

Thanks for your replies Kostas & Aljoscha!

Below are replies point by point.


1. For DETACHED mode, what I said there is about the DETACHED mode in
client side.
There are two configurations overload the item DETACHED[1].

In client side, it means whether or not client.submitJob is blocking to job
execution result.
Due to client.submitJob returns CompletableFuture<JobClient> NON-DETACHED
is no
power at all. Caller of submitJob makes the decision whether or not
blocking to get the
JobClient and request for the job execution result. If client crashes, it
is a user scope
exception that should be handled in user code; if client lost connection to
cluster, we have
a retry times and interval configuration that automatically retry and
throws an user scope
exception if exceed.

Your comment about poll for result or job result sounds like a concern on
cluster side.

In cluster side, DETACHED mode is alive only in JobCluster. If DETACHED
configured,
JobCluster exits on job finished; if NON-DETACHED configured, JobCluster
exits on job
execution result delivered. FLIP-74 doesn't stick to changes on this scope,
it is just remained.

However, it is an interesting part we can revisit this implementation a bit.

<see the next email for compact reply in this one>

2. The retrieval of JobClient is so important that if we don't have a way
to retrieve JobClient it is
a dumb public user-facing interface(what a strange state :P).

About the retrieval of JobClient, as mentioned in the document, two ways
should be supported.

(1). Retrieved as return type of job submission.
(2). Retrieve a JobClient of existing job.(with job id)

I highly respect your thoughts about how Executors should be and thoughts
on multi-layered clients.
Although, (2) is not supported by public interfaces as summary of
discussion above, we can discuss
a bit on the place of Executors on multi-layered clients and find a way to
retrieve JobClient of
existing job with public client API. I will comment in FLIP-73 thread[2]
since it is almost about Executors.

Best,
tison.

[1]
https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit?disco=AAAADnLLvM8
[2]
https://lists.apache.org/x/thread.html/dc3a541709f96906b43df4155373af1cd09e08c3f105b0bd0ba3fca2@%3Cdev.flink.apache.org%3E




Kostas Kloudas <[email protected]> 于2019年9月25日周三 下午9:29写道：

> Hi Tison,
>
> Thanks for the FLIP and launching the discussion!
>
> As a first note, big +1 on providing/exposing a JobClient to the users!
>
> Some points that would be nice to be clarified:
> 1) You mention that we can get rid of the DETACHED mode: I agree that
> at a high level, given that everything will now be asynchronous, there
> is no need to keep the DETACHED mode but I think we should specify
> some aspects. For example, without the explicit separation of the
> modes, what happens when the job finishes. Does the client
> periodically poll for the result always or the result is pushed when
> in NON-DETACHED mode? What happens if the client disconnects and
> reconnects?
>
> 2) On the "how to retrieve a JobClient for a running Job", I think
> this is related to the other discussion you opened in the ML about
> multi-layered clients. First of all, I agree that exposing different
> "levels" of clients would be a nice addition, and actually there have
> been some discussions about doing so in the future. Now for this
> specific discussion:
>       i) I do not think that we should expose the
> ClusterDescriptor/ClusterSpecification to the user, as this ties us to
> a specific architecture which may change in the future.
>      ii) I do not think it should be the Executor that will provide a
> JobClient for an already running job (only for the Jobs that it
> submits). The job of the executor should just be to execute() a
> pipeline.
>      iii) I think a solution that respects the separation of concerns
> could be the addition of another component (in the future), something
> like a ClientFactory, or ClusterFactory that will have methods like:
> ClusterClient createCluster(Configuration), JobClient
> retrieveJobClient(Configuration , JobId), maybe even (although not
> sure) Executor getExecutor(Configuration ) and maybe more. This
> component would be responsible to interact with a cluster manager like
> Yarn and do what is now being done by the ClusterDescriptor plus some
> more stuff.
>
> Although under the hood all these abstractions (Environments,
> Executors, ...) underneath use the same clients, I believe their
> job/existence is not contradicting but they simply hide some of the
> complexity from the user, and give us, as developers some freedom to
> change in the future some of the parts. For example, the executor will
> take a Pipeline, create a JobGraph and submit it, instead of requiring
> the user to do each step separately. This allows us to, for example,
> get rid of the Plan if in the future everything is DataStream.
> Essentially, I think of these as layers of an onion with the clients
> being close to the core. The higher you go, the more functionality is
> included and hidden from the public eye.
>
> Point iii) by the way is just a thought and by no means final. I also
> like the idea of multi-layered clients so this may spark up the
> discussion.
>
> Cheers,
> Kostas
>
> On Wed, Sep 25, 2019 at 2:21 PM Aljoscha Krettek <[email protected]>
> wrote:
> >
> > Hi Tison,
> >
> > Thanks for proposing the document! I had some comments on the document.
> >
> > I think the only complex thing that we still need to figure out is how
> to get a JobClient for a job that is already running. As you mentioned in
> the document. Currently I’m thinking that its ok to add a method to
> Executor for retrieving a JobClient for a running job by providing an ID.
> Let’s see what Kostas has to say on the topic.
> >
> > Best,
> > Aljoscha
> >
> > > On 25. Sep 2019, at 12:31, Zili Chen <[email protected]> wrote:
> > >
> > > Hi all,
> > >
> > > Summary from the discussion about introducing Flink JobClient API[1] we
> > > draft FLIP-74[2] to
> > > gather thoughts and towards a standard public user-facing interfaces.
> > >
> > > This discussion thread aims at standardizing job level client API. But
> I'd
> > > like to emphasize that
> > > how to retrieve JobClient possibly causes further discussion on
> different
> > > level clients exposed from
> > > Flink so that a following thread will be started later to coordinate
> > > FLIP-73 and FLIP-74 on
> > > expose issue.
> > >
> > > Looking forward to your opinions.
> > >
> > > Best,
> > > tison.
> > >
> > > [1]
> > >
> https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E
> > > [2]
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-74%3A+Flink+JobClient+API
> >
>

Re: [DISCUSS] FLIP-74: Flink JobClient API

Reply via email to