Hi Flavio, I agree that this would be good to have. But I also think that this is outside the scope of FLIP-74, I think it is an orthogonal feature.
Best, Aljoscha > On 27. Sep 2019, at 10:31, Flavio Pompermaier <pomperma...@okkam.it> wrote: > > Hi all, > just a remark about the Flink REST APIs (and its client as well): almost > all the times we need a way to dynamically know which jobs are contained in > a jar file, and this could be exposed by the REST endpoint under > /jars/:jarid/entry-points (a simple way to implement this would be to check > the value of Main-class or Main-classes inside the Manifest of the jar if > they exists [1]). > > I understand that this is something that is not strictly required to > execute Flink jobs but IMHO it would ease A LOT the work of UI developers > that could have a way to show the users all available jobs inside a jar + > their configurable parameters. > For example, right now in the WebUI, you can upload a jar and then you have > to set (without any autocomplete or UI support) the main class and their > params (for example using a string like --param1 xx --param2 yy). > Adding this functionality to the REST API and the respective client would > enable the WebUI (and all UIs interacting with a Flink cluster) to prefill > a dropdown list containing the list of entry-point classes (i.e. Flink > jobs) and, once selected, their required (typed) parameters. > > Best, > Flavio > > [1] https://issues.apache.org/jira/browse/FLINK-10864 > > On Fri, Sep 27, 2019 at 9:16 AM Zili Chen <wander4...@gmail.com> wrote: > >> modify >> >> /we just shutdown the cluster on the exit of client that running inside >> cluster/ >> >> to >> >> we just shutdown the cluster on both the exit of client that running inside >> cluster and the finish of job. >> Since client is running inside cluster we can easily wait for the end of >> two both in ClusterEntrypoint. >> >> >> Zili Chen <wander4...@gmail.com> 于2019年9月27日周五 下午3:13写道: >> >>> About JobCluster >>> >>> Actually I am not quite sure what we gains from DETACHED configuration on >>> cluster side. >>> We don't have a NON-DETACHED JobCluster in fact in our codebase, right? >>> >>> It comes to me one major questions we have to answer first. >>> >>> *What JobCluster conceptually is exactly* >>> >>> Related discussion can be found in JIRA[1] and mailing list[2]. Stephan >>> gives a nice >>> description of JobCluster: >>> >>> Two things to add: - The job mode is very nice in the way that it runs >> the >>> client inside the cluster (in the same image/process that is the JM) and >>> thus unifies both applications and what the Spark world calls the "driver >>> mode". - Another thing I would add is that during the FLIP-6 design, we >>> were thinking about setups where Dispatcher and JobManager are separate >>> processes. A Yarn or Mesos Dispatcher of a session could run >> independently >>> (even as privileged processes executing no code). Then you the "per-job" >>> mode could still be helpful: when a job is submitted to the dispatcher, >> it >>> launches the JM again in a per-job mode, so that JM and TM processes are >>> bound to teh job only. For higher security setups, it is important that >>> processes are not reused across jobs. >>> >>> However, currently in "per-job" mode we generate JobGraph in client side, >>> launching >>> the JobCluster and retrieve the JobGraph for execution. So actually, we >>> don't "run the >>> client inside the cluster". >>> >>> Besides, refer to the discussion with Till[1], it would be helpful we >>> follow the same process >>> of session mode for that of "per-job" mode in user perspective, that we >>> don't use >>> OptimizedPlanEnvironment to create JobGraph, but directly deploy Flink >>> cluster in env.execute. >>> >>> Generally 2 points >>> >>> 1. Running Flink job by invoke user main method and execute throughout, >>> instead of create >>> JobGraph from main-class. >>> 2. Run the client inside the cluster. >>> >>> If 1 and 2 are implemented. There is obvious no need for DETACHED mode in >>> cluster side >>> because we just shutdown the cluster on the exit of client that running >>> inside cluster. Whether >>> or not delivered the result is up to user code. >>> >>> [1] >>> >> https://issues.apache.org/jira/browse/FLINK-14051?focusedCommentId=16931388&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16931388 >>> [2] >>> >> https://lists.apache.org/x/thread.html/e8f14a381be6c027e8945f884c3cfcb309ce49c1ba557d3749fca495@%3Cdev.flink.apache.org%3E >>> >>> >>> Zili Chen <wander4...@gmail.com> 于2019年9月27日周五 下午2:13写道: >>> >>>> Thanks for your replies Kostas & Aljoscha! >>>> >>>> Below are replies point by point. >>>> >>>> 1. For DETACHED mode, what I said there is about the DETACHED mode in >>>> client side. >>>> There are two configurations overload the item DETACHED[1]. >>>> >>>> In client side, it means whether or not client.submitJob is blocking to >>>> job execution result. >>>> Due to client.submitJob returns CompletableFuture<JobClient> >> NON-DETACHED >>>> is no >>>> power at all. Caller of submitJob makes the decision whether or not >>>> blocking to get the >>>> JobClient and request for the job execution result. If client crashes, >> it >>>> is a user scope >>>> exception that should be handled in user code; if client lost connection >>>> to cluster, we have >>>> a retry times and interval configuration that automatically retry and >>>> throws an user scope >>>> exception if exceed. >>>> >>>> Your comment about poll for result or job result sounds like a concern >> on >>>> cluster side. >>>> >>>> In cluster side, DETACHED mode is alive only in JobCluster. If DETACHED >>>> configured, >>>> JobCluster exits on job finished; if NON-DETACHED configured, JobCluster >>>> exits on job >>>> execution result delivered. FLIP-74 doesn't stick to changes on this >>>> scope, it is just remained. >>>> >>>> However, it is an interesting part we can revisit this implementation a >>>> bit. >>>> >>>> <see the next email for compact reply in this one> >>>> >>>> 2. The retrieval of JobClient is so important that if we don't have a >> way >>>> to retrieve JobClient it is >>>> a dumb public user-facing interface(what a strange state :P). >>>> >>>> About the retrieval of JobClient, as mentioned in the document, two ways >>>> should be supported. >>>> >>>> (1). Retrieved as return type of job submission. >>>> (2). Retrieve a JobClient of existing job.(with job id) >>>> >>>> I highly respect your thoughts about how Executors should be and >> thoughts >>>> on multi-layered clients. >>>> Although, (2) is not supported by public interfaces as summary of >>>> discussion above, we can discuss >>>> a bit on the place of Executors on multi-layered clients and find a way >>>> to retrieve JobClient of >>>> existing job with public client API. I will comment in FLIP-73 thread[2] >>>> since it is almost about Executors. >>>> >>>> Best, >>>> tison. >>>> >>>> [1] >>>> >> https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit?disco=AAAADnLLvM8 >>>> [2] >>>> >> https://lists.apache.org/x/thread.html/dc3a541709f96906b43df4155373af1cd09e08c3f105b0bd0ba3fca2@%3Cdev.flink.apache.org%3E >>>> >>>> >>>> >>>> >>>> Kostas Kloudas <kklou...@gmail.com> 于2019年9月25日周三 下午9:29写道: >>>> >>>>> Hi Tison, >>>>> >>>>> Thanks for the FLIP and launching the discussion! >>>>> >>>>> As a first note, big +1 on providing/exposing a JobClient to the users! >>>>> >>>>> Some points that would be nice to be clarified: >>>>> 1) You mention that we can get rid of the DETACHED mode: I agree that >>>>> at a high level, given that everything will now be asynchronous, there >>>>> is no need to keep the DETACHED mode but I think we should specify >>>>> some aspects. For example, without the explicit separation of the >>>>> modes, what happens when the job finishes. Does the client >>>>> periodically poll for the result always or the result is pushed when >>>>> in NON-DETACHED mode? What happens if the client disconnects and >>>>> reconnects? >>>>> >>>>> 2) On the "how to retrieve a JobClient for a running Job", I think >>>>> this is related to the other discussion you opened in the ML about >>>>> multi-layered clients. First of all, I agree that exposing different >>>>> "levels" of clients would be a nice addition, and actually there have >>>>> been some discussions about doing so in the future. Now for this >>>>> specific discussion: >>>>> i) I do not think that we should expose the >>>>> ClusterDescriptor/ClusterSpecification to the user, as this ties us to >>>>> a specific architecture which may change in the future. >>>>> ii) I do not think it should be the Executor that will provide a >>>>> JobClient for an already running job (only for the Jobs that it >>>>> submits). The job of the executor should just be to execute() a >>>>> pipeline. >>>>> iii) I think a solution that respects the separation of concerns >>>>> could be the addition of another component (in the future), something >>>>> like a ClientFactory, or ClusterFactory that will have methods like: >>>>> ClusterClient createCluster(Configuration), JobClient >>>>> retrieveJobClient(Configuration , JobId), maybe even (although not >>>>> sure) Executor getExecutor(Configuration ) and maybe more. This >>>>> component would be responsible to interact with a cluster manager like >>>>> Yarn and do what is now being done by the ClusterDescriptor plus some >>>>> more stuff. >>>>> >>>>> Although under the hood all these abstractions (Environments, >>>>> Executors, ...) underneath use the same clients, I believe their >>>>> job/existence is not contradicting but they simply hide some of the >>>>> complexity from the user, and give us, as developers some freedom to >>>>> change in the future some of the parts. For example, the executor will >>>>> take a Pipeline, create a JobGraph and submit it, instead of requiring >>>>> the user to do each step separately. This allows us to, for example, >>>>> get rid of the Plan if in the future everything is DataStream. >>>>> Essentially, I think of these as layers of an onion with the clients >>>>> being close to the core. The higher you go, the more functionality is >>>>> included and hidden from the public eye. >>>>> >>>>> Point iii) by the way is just a thought and by no means final. I also >>>>> like the idea of multi-layered clients so this may spark up the >>>>> discussion. >>>>> >>>>> Cheers, >>>>> Kostas >>>>> >>>>> On Wed, Sep 25, 2019 at 2:21 PM Aljoscha Krettek <aljos...@apache.org> >>>>> wrote: >>>>>> >>>>>> Hi Tison, >>>>>> >>>>>> Thanks for proposing the document! I had some comments on the >> document. >>>>>> >>>>>> I think the only complex thing that we still need to figure out is >> how >>>>> to get a JobClient for a job that is already running. As you mentioned >> in >>>>> the document. Currently I’m thinking that its ok to add a method to >>>>> Executor for retrieving a JobClient for a running job by providing an >> ID. >>>>> Let’s see what Kostas has to say on the topic. >>>>>> >>>>>> Best, >>>>>> Aljoscha >>>>>> >>>>>>> On 25. Sep 2019, at 12:31, Zili Chen <wander4...@gmail.com> wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> Summary from the discussion about introducing Flink JobClient >> API[1] >>>>> we >>>>>>> draft FLIP-74[2] to >>>>>>> gather thoughts and towards a standard public user-facing >> interfaces. >>>>>>> >>>>>>> This discussion thread aims at standardizing job level client API. >>>>> But I'd >>>>>>> like to emphasize that >>>>>>> how to retrieve JobClient possibly causes further discussion on >>>>> different >>>>>>> level clients exposed from >>>>>>> Flink so that a following thread will be started later to >> coordinate >>>>>>> FLIP-73 and FLIP-74 on >>>>>>> expose issue. >>>>>>> >>>>>>> Looking forward to your opinions. >>>>>>> >>>>>>> Best, >>>>>>> tison. >>>>>>> >>>>>>> [1] >>>>>>> >>>>> >> https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E >>>>>>> [2] >>>>>>> >>>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-74%3A+Flink+JobClient+API >>>>>> >>>>> >>>>