Hi all, just a remark about the Flink REST APIs (and its client as well): almost all the times we need a way to dynamically know which jobs are contained in a jar file, and this could be exposed by the REST endpoint under /jars/:jarid/entry-points (a simple way to implement this would be to check the value of Main-class or Main-classes inside the Manifest of the jar if they exists [1]).
I understand that this is something that is not strictly required to execute Flink jobs but IMHO it would ease A LOT the work of UI developers that could have a way to show the users all available jobs inside a jar + their configurable parameters. For example, right now in the WebUI, you can upload a jar and then you have to set (without any autocomplete or UI support) the main class and their params (for example using a string like --param1 xx --param2 yy). Adding this functionality to the REST API and the respective client would enable the WebUI (and all UIs interacting with a Flink cluster) to prefill a dropdown list containing the list of entry-point classes (i.e. Flink jobs) and, once selected, their required (typed) parameters. Best, Flavio [1] https://issues.apache.org/jira/browse/FLINK-10864 On Fri, Sep 27, 2019 at 9:16 AM Zili Chen <wander4...@gmail.com> wrote: > modify > > /we just shutdown the cluster on the exit of client that running inside > cluster/ > > to > > we just shutdown the cluster on both the exit of client that running inside > cluster and the finish of job. > Since client is running inside cluster we can easily wait for the end of > two both in ClusterEntrypoint. > > > Zili Chen <wander4...@gmail.com> 于2019年9月27日周五 下午3:13写道: > > > About JobCluster > > > > Actually I am not quite sure what we gains from DETACHED configuration on > > cluster side. > > We don't have a NON-DETACHED JobCluster in fact in our codebase, right? > > > > It comes to me one major questions we have to answer first. > > > > *What JobCluster conceptually is exactly* > > > > Related discussion can be found in JIRA[1] and mailing list[2]. Stephan > > gives a nice > > description of JobCluster: > > > > Two things to add: - The job mode is very nice in the way that it runs > the > > client inside the cluster (in the same image/process that is the JM) and > > thus unifies both applications and what the Spark world calls the "driver > > mode". - Another thing I would add is that during the FLIP-6 design, we > > were thinking about setups where Dispatcher and JobManager are separate > > processes. A Yarn or Mesos Dispatcher of a session could run > independently > > (even as privileged processes executing no code). Then you the "per-job" > > mode could still be helpful: when a job is submitted to the dispatcher, > it > > launches the JM again in a per-job mode, so that JM and TM processes are > > bound to teh job only. For higher security setups, it is important that > > processes are not reused across jobs. > > > > However, currently in "per-job" mode we generate JobGraph in client side, > > launching > > the JobCluster and retrieve the JobGraph for execution. So actually, we > > don't "run the > > client inside the cluster". > > > > Besides, refer to the discussion with Till[1], it would be helpful we > > follow the same process > > of session mode for that of "per-job" mode in user perspective, that we > > don't use > > OptimizedPlanEnvironment to create JobGraph, but directly deploy Flink > > cluster in env.execute. > > > > Generally 2 points > > > > 1. Running Flink job by invoke user main method and execute throughout, > > instead of create > > JobGraph from main-class. > > 2. Run the client inside the cluster. > > > > If 1 and 2 are implemented. There is obvious no need for DETACHED mode in > > cluster side > > because we just shutdown the cluster on the exit of client that running > > inside cluster. Whether > > or not delivered the result is up to user code. > > > > [1] > > > https://issues.apache.org/jira/browse/FLINK-14051?focusedCommentId=16931388&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16931388 > > [2] > > > https://lists.apache.org/x/thread.html/e8f14a381be6c027e8945f884c3cfcb309ce49c1ba557d3749fca495@%3Cdev.flink.apache.org%3E > > > > > > Zili Chen <wander4...@gmail.com> 于2019年9月27日周五 下午2:13写道: > > > >> Thanks for your replies Kostas & Aljoscha! > >> > >> Below are replies point by point. > >> > >> 1. For DETACHED mode, what I said there is about the DETACHED mode in > >> client side. > >> There are two configurations overload the item DETACHED[1]. > >> > >> In client side, it means whether or not client.submitJob is blocking to > >> job execution result. > >> Due to client.submitJob returns CompletableFuture<JobClient> > NON-DETACHED > >> is no > >> power at all. Caller of submitJob makes the decision whether or not > >> blocking to get the > >> JobClient and request for the job execution result. If client crashes, > it > >> is a user scope > >> exception that should be handled in user code; if client lost connection > >> to cluster, we have > >> a retry times and interval configuration that automatically retry and > >> throws an user scope > >> exception if exceed. > >> > >> Your comment about poll for result or job result sounds like a concern > on > >> cluster side. > >> > >> In cluster side, DETACHED mode is alive only in JobCluster. If DETACHED > >> configured, > >> JobCluster exits on job finished; if NON-DETACHED configured, JobCluster > >> exits on job > >> execution result delivered. FLIP-74 doesn't stick to changes on this > >> scope, it is just remained. > >> > >> However, it is an interesting part we can revisit this implementation a > >> bit. > >> > >> <see the next email for compact reply in this one> > >> > >> 2. The retrieval of JobClient is so important that if we don't have a > way > >> to retrieve JobClient it is > >> a dumb public user-facing interface(what a strange state :P). > >> > >> About the retrieval of JobClient, as mentioned in the document, two ways > >> should be supported. > >> > >> (1). Retrieved as return type of job submission. > >> (2). Retrieve a JobClient of existing job.(with job id) > >> > >> I highly respect your thoughts about how Executors should be and > thoughts > >> on multi-layered clients. > >> Although, (2) is not supported by public interfaces as summary of > >> discussion above, we can discuss > >> a bit on the place of Executors on multi-layered clients and find a way > >> to retrieve JobClient of > >> existing job with public client API. I will comment in FLIP-73 thread[2] > >> since it is almost about Executors. > >> > >> Best, > >> tison. > >> > >> [1] > >> > https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit?disco=AAAADnLLvM8 > >> [2] > >> > https://lists.apache.org/x/thread.html/dc3a541709f96906b43df4155373af1cd09e08c3f105b0bd0ba3fca2@%3Cdev.flink.apache.org%3E > >> > >> > >> > >> > >> Kostas Kloudas <kklou...@gmail.com> 于2019年9月25日周三 下午9:29写道: > >> > >>> Hi Tison, > >>> > >>> Thanks for the FLIP and launching the discussion! > >>> > >>> As a first note, big +1 on providing/exposing a JobClient to the users! > >>> > >>> Some points that would be nice to be clarified: > >>> 1) You mention that we can get rid of the DETACHED mode: I agree that > >>> at a high level, given that everything will now be asynchronous, there > >>> is no need to keep the DETACHED mode but I think we should specify > >>> some aspects. For example, without the explicit separation of the > >>> modes, what happens when the job finishes. Does the client > >>> periodically poll for the result always or the result is pushed when > >>> in NON-DETACHED mode? What happens if the client disconnects and > >>> reconnects? > >>> > >>> 2) On the "how to retrieve a JobClient for a running Job", I think > >>> this is related to the other discussion you opened in the ML about > >>> multi-layered clients. First of all, I agree that exposing different > >>> "levels" of clients would be a nice addition, and actually there have > >>> been some discussions about doing so in the future. Now for this > >>> specific discussion: > >>> i) I do not think that we should expose the > >>> ClusterDescriptor/ClusterSpecification to the user, as this ties us to > >>> a specific architecture which may change in the future. > >>> ii) I do not think it should be the Executor that will provide a > >>> JobClient for an already running job (only for the Jobs that it > >>> submits). The job of the executor should just be to execute() a > >>> pipeline. > >>> iii) I think a solution that respects the separation of concerns > >>> could be the addition of another component (in the future), something > >>> like a ClientFactory, or ClusterFactory that will have methods like: > >>> ClusterClient createCluster(Configuration), JobClient > >>> retrieveJobClient(Configuration , JobId), maybe even (although not > >>> sure) Executor getExecutor(Configuration ) and maybe more. This > >>> component would be responsible to interact with a cluster manager like > >>> Yarn and do what is now being done by the ClusterDescriptor plus some > >>> more stuff. > >>> > >>> Although under the hood all these abstractions (Environments, > >>> Executors, ...) underneath use the same clients, I believe their > >>> job/existence is not contradicting but they simply hide some of the > >>> complexity from the user, and give us, as developers some freedom to > >>> change in the future some of the parts. For example, the executor will > >>> take a Pipeline, create a JobGraph and submit it, instead of requiring > >>> the user to do each step separately. This allows us to, for example, > >>> get rid of the Plan if in the future everything is DataStream. > >>> Essentially, I think of these as layers of an onion with the clients > >>> being close to the core. The higher you go, the more functionality is > >>> included and hidden from the public eye. > >>> > >>> Point iii) by the way is just a thought and by no means final. I also > >>> like the idea of multi-layered clients so this may spark up the > >>> discussion. > >>> > >>> Cheers, > >>> Kostas > >>> > >>> On Wed, Sep 25, 2019 at 2:21 PM Aljoscha Krettek <aljos...@apache.org> > >>> wrote: > >>> > > >>> > Hi Tison, > >>> > > >>> > Thanks for proposing the document! I had some comments on the > document. > >>> > > >>> > I think the only complex thing that we still need to figure out is > how > >>> to get a JobClient for a job that is already running. As you mentioned > in > >>> the document. Currently I’m thinking that its ok to add a method to > >>> Executor for retrieving a JobClient for a running job by providing an > ID. > >>> Let’s see what Kostas has to say on the topic. > >>> > > >>> > Best, > >>> > Aljoscha > >>> > > >>> > > On 25. Sep 2019, at 12:31, Zili Chen <wander4...@gmail.com> wrote: > >>> > > > >>> > > Hi all, > >>> > > > >>> > > Summary from the discussion about introducing Flink JobClient > API[1] > >>> we > >>> > > draft FLIP-74[2] to > >>> > > gather thoughts and towards a standard public user-facing > interfaces. > >>> > > > >>> > > This discussion thread aims at standardizing job level client API. > >>> But I'd > >>> > > like to emphasize that > >>> > > how to retrieve JobClient possibly causes further discussion on > >>> different > >>> > > level clients exposed from > >>> > > Flink so that a following thread will be started later to > coordinate > >>> > > FLIP-73 and FLIP-74 on > >>> > > expose issue. > >>> > > > >>> > > Looking forward to your opinions. > >>> > > > >>> > > Best, > >>> > > tison. > >>> > > > >>> > > [1] > >>> > > > >>> > https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E > >>> > > [2] > >>> > > > >>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-74%3A+Flink+JobClient+API > >>> > > >>> > >>