On Mon, Aug 19, 2019 at 5:52 PM Ahmet Altay <al...@google.com> wrote:
> > > On Sun, Aug 18, 2019 at 12:34 PM Thomas Weise <t...@apache.org> wrote: > >> There is a PR open for this: https://github.com/apache/beam/pull/9331 >> >> (it wasn't tagged with the JIRA and therefore not linked) >> >> I think it is worthwhile to explore how we could further detangle the >> client side Python and Java dependencies. >> >> The expansion service is one more dependency to consider in a build >> environment. Is it really necessary to expand external transforms prior to >> submission to the job service? >> > > +1, this will make it easier to use external transforms from the already > familiar client environments. > > The intent is to make it so that you CAN (not MUST) run an expansion service separate from a Runner. Creating a single endpoint that hosts both the Job and Expansion service is something that gRPC does very easily since you can host multiple service definitions on a single port. > >> Can we come up with a partially constructed proto that can be produced by >> just running the Python entry point? Note this would also require pushing >> the pipeline options parsing into the job service. >> > > Why would this require pushing the pipeline options parsing to the job > service. Assuming that python will have enough idea about the external > transform what options it will need. The necessary bit could be converted > to arguments and be part of that partially constructed proto. > > >> >> On Sun, Aug 18, 2019 at 12:01 PM enrico canzonieri <ecanzoni...@gmail.com> >> wrote: >> >>> I found the tracking ticket at BEAM-7966 >>> <https://jira.apache.org/jira/browse/BEAM-7966> >>> >>> On Sun, Aug 18, 2019 at 11:59 AM enrico canzonieri < >>> ecanzoni...@gmail.com> wrote: >>> >>>> Is this alternative still being considered? Creating a portable jar >>>> sounds like a good solution to re-use the existing runner specific >>>> deployment mechanism (e.g. Flink k8s operator) and in general simplify the >>>> deployment story. >>>> >>>> On Fri, Aug 9, 2019 at 12:46 AM Robert Bradshaw <rober...@google.com> >>>> wrote: >>>> >>>>> The expansion service is a separate service. (The flink jar happens to >>>>> bring both up.) However, there is negotiation to receive/validate the >>>>> pipeline options. >>>>> >>>>> On Fri, Aug 9, 2019 at 1:54 AM Thomas Weise <t...@apache.org> wrote: >>>>> > >>>>> > We would also need to consider cross-language pipelines that >>>>> (currently) assume the interaction with an expansion service at >>>>> construction time. >>>>> > >>>>> > On Thu, Aug 8, 2019, 4:38 PM Kyle Weaver <kcwea...@google.com> >>>>> wrote: >>>>> >> >>>>> >> > It might also be useful to have the option to just output the >>>>> proto and artifacts, as alternative to the jar file. >>>>> >> >>>>> >> Sure, that wouldn't be too big a change if we were to decide to go >>>>> the SDK route. >>>>> >> >>>>> >> > For the Flink entry point we would need to allow for the job >>>>> server to be used as a library. >>>>> >> >>>>> >> We don't need the whole job server, we only need to add a main >>>>> method to FlinkPipelineRunner [1] as the entry point, which would >>>>> basically >>>>> just do the setup described in the doc then call FlinkPipelineRunner::run. >>>>> >> >>>>> >> [1] >>>>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java#L53 >>>>> >> >>>>> >> Kyle Weaver | Software Engineer | github.com/ibzib | >>>>> kcwea...@google.com >>>>> >> >>>>> >> >>>>> >> On Thu, Aug 8, 2019 at 4:21 PM Thomas Weise <t...@apache.org> wrote: >>>>> >>> >>>>> >>> Hi Kyle, >>>>> >>> >>>>> >>> It might also be useful to have the option to just output the >>>>> proto and artifacts, as alternative to the jar file. >>>>> >>> >>>>> >>> For the Flink entry point we would need to allow for the job >>>>> server to be used as a library. It would probably not be too hard to have >>>>> the Flink job constructed via the context execution environment, which >>>>> would require no changes on the Flink side. >>>>> >>> >>>>> >>> Thanks, >>>>> >>> Thomas >>>>> >>> >>>>> >>> >>>>> >>> On Thu, Aug 8, 2019 at 9:52 AM Kyle Weaver <kcwea...@google.com> >>>>> wrote: >>>>> >>>> >>>>> >>>> Re Javaless/serverless solution: >>>>> >>>> I take it this would probably mean that we would construct the >>>>> jar directly from the SDK. There are advantages to this: full separation >>>>> of >>>>> Python and Java environments, no need for a job server, and likely a >>>>> simpler implementation, since we'd no longer have to work within the >>>>> constraints of the existing job server infrastructure. The only downside I >>>>> can think of is the additional cost of implementing/maintaining jar >>>>> creation code in each SDK, but that cost may be acceptable if it's simple >>>>> enough. >>>>> >>>> >>>>> >>>> Kyle Weaver | Software Engineer | github.com/ibzib | >>>>> kcwea...@google.com >>>>> >>>> >>>>> >>>> >>>>> >>>> On Thu, Aug 8, 2019 at 9:31 AM Thomas Weise <t...@apache.org> >>>>> wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Aug 8, 2019 at 8:29 AM Robert Bradshaw < >>>>> rober...@google.com> wrote: >>>>> >>>>>> >>>>> >>>>>> > Before assembling the jar, the job server runs to create the >>>>> ingredients. That requires the (matching) Java environment on the Python >>>>> developers machine. >>>>> >>>>>> >>>>> >>>>>> We can run the job server and have it create the jar (and if we >>>>> keep >>>>> >>>>>> the job server running we can use it to interact with the >>>>> running >>>>> >>>>>> job). However, if the jar layout is simple enough, there's no >>>>> need to >>>>> >>>>>> even build it from Java. >>>>> >>>>>> >>>>> >>>>>> Taken to the extreme, this is a one-shot, jar-based JobService >>>>> API. We >>>>> >>>>>> choose a standard layout of where to put the pipeline >>>>> description and >>>>> >>>>>> artifacts, and can "augment" an existing jar (that has a >>>>> >>>>>> runner-specific main class whose entry point knows how to read >>>>> this >>>>> >>>>>> data to kick off a pipeline as if it were a users driver code) >>>>> into >>>>> >>>>>> one that has a portable pipeline packaged into it for >>>>> submission to a >>>>> >>>>>> cluster. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> It would be nice if the Python developer doesn't have to run >>>>> anything Java at all. >>>>> >>>>> >>>>> >>>>> As we just discussed offline, this could be accomplished by >>>>> including the proto that is produced by the SDK into the pre-existing jar. >>>>> >>>>> >>>>> >>>>> And if the jar has an entry point that creates the Flink job in >>>>> the prescribed manner [1], it can be directly submitted to the Flink REST >>>>> API. That would allow for Java free client. >>>>> >>>>> >>>>> >>>>> [1] >>>>> https://lists.apache.org/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E >>>>> >>>>> >>>>> >>>>