On Sun, Aug 18, 2019 at 12:34 PM Thomas Weise <[email protected]> wrote:

> There is a PR open for this: https://github.com/apache/beam/pull/9331
>
> (it wasn't tagged with the JIRA and therefore not linked)
>
> I think it is worthwhile to explore how we could further detangle the
> client side Python and Java dependencies.
>
> The expansion service is one more dependency to consider in a build
> environment. Is it really necessary to expand external transforms prior to
> submission to the job service?
>

+1, this will make it easier to use external transforms from the already
familiar client environments.


>
> Can we come up with a partially constructed proto that can be produced by
> just running the Python entry point? Note this would also require pushing
> the pipeline options parsing into the job service.
>

Why would this require pushing the pipeline options parsing to the job
service. Assuming that python will have enough idea about the external
transform what options it will need. The necessary bit could be converted
to arguments and be part of that partially constructed proto.


>
> On Sun, Aug 18, 2019 at 12:01 PM enrico canzonieri <[email protected]>
> wrote:
>
>> I found the tracking ticket at BEAM-7966
>> <https://jira.apache.org/jira/browse/BEAM-7966>
>>
>> On Sun, Aug 18, 2019 at 11:59 AM enrico canzonieri <[email protected]>
>> wrote:
>>
>>> Is this alternative still being considered? Creating a portable jar
>>> sounds like a good solution to re-use the existing runner specific
>>> deployment mechanism (e.g. Flink k8s operator) and in general simplify the
>>> deployment story.
>>>
>>> On Fri, Aug 9, 2019 at 12:46 AM Robert Bradshaw <[email protected]>
>>> wrote:
>>>
>>>> The expansion service is a separate service. (The flink jar happens to
>>>> bring both up.) However, there is negotiation to receive/validate the
>>>> pipeline options.
>>>>
>>>> On Fri, Aug 9, 2019 at 1:54 AM Thomas Weise <[email protected]> wrote:
>>>> >
>>>> > We would also need to consider cross-language pipelines that
>>>> (currently) assume the interaction with an expansion service at
>>>> construction time.
>>>> >
>>>> > On Thu, Aug 8, 2019, 4:38 PM Kyle Weaver <[email protected]> wrote:
>>>> >>
>>>> >> > It might also be useful to have the option to just output the
>>>> proto and artifacts, as alternative to the jar file.
>>>> >>
>>>> >> Sure, that wouldn't be too big a change if we were to decide to go
>>>> the SDK route.
>>>> >>
>>>> >> > For the Flink entry point we would need to allow for the job
>>>> server to be used as a library.
>>>> >>
>>>> >> We don't need the whole job server, we only need to add a main
>>>> method to FlinkPipelineRunner [1] as the entry point, which would basically
>>>> just do the setup described in the doc then call FlinkPipelineRunner::run.
>>>> >>
>>>> >> [1]
>>>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java#L53
>>>> >>
>>>> >> Kyle Weaver | Software Engineer | github.com/ibzib |
>>>> [email protected]
>>>> >>
>>>> >>
>>>> >> On Thu, Aug 8, 2019 at 4:21 PM Thomas Weise <[email protected]> wrote:
>>>> >>>
>>>> >>> Hi Kyle,
>>>> >>>
>>>> >>> It might also be useful to have the option to just output the proto
>>>> and artifacts, as alternative to the jar file.
>>>> >>>
>>>> >>> For the Flink entry point we would need to allow for the job server
>>>> to be used as a library. It would probably not be too hard to have the
>>>> Flink job constructed via the context execution environment, which would
>>>> require no changes on the Flink side.
>>>> >>>
>>>> >>> Thanks,
>>>> >>> Thomas
>>>> >>>
>>>> >>>
>>>> >>> On Thu, Aug 8, 2019 at 9:52 AM Kyle Weaver <[email protected]>
>>>> wrote:
>>>> >>>>
>>>> >>>> Re Javaless/serverless solution:
>>>> >>>> I take it this would probably mean that we would construct the jar
>>>> directly from the SDK. There are advantages to this: full separation of
>>>> Python and Java environments, no need for a job server, and likely a
>>>> simpler implementation, since we'd no longer have to work within the
>>>> constraints of the existing job server infrastructure. The only downside I
>>>> can think of is the additional cost of implementing/maintaining jar
>>>> creation code in each SDK, but that cost may be acceptable if it's simple
>>>> enough.
>>>> >>>>
>>>> >>>> Kyle Weaver | Software Engineer | github.com/ibzib |
>>>> [email protected]
>>>> >>>>
>>>> >>>>
>>>> >>>> On Thu, Aug 8, 2019 at 9:31 AM Thomas Weise <[email protected]>
>>>> wrote:
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> On Thu, Aug 8, 2019 at 8:29 AM Robert Bradshaw <
>>>> [email protected]> wrote:
>>>> >>>>>>
>>>> >>>>>> > Before assembling the jar, the job server runs to create the
>>>> ingredients. That requires the (matching) Java environment on the Python
>>>> developers machine.
>>>> >>>>>>
>>>> >>>>>> We can run the job server and have it create the jar (and if we
>>>> keep
>>>> >>>>>> the job server running we can use it to interact with the running
>>>> >>>>>> job). However, if the jar layout is simple enough, there's no
>>>> need to
>>>> >>>>>> even build it from Java.
>>>> >>>>>>
>>>> >>>>>> Taken to the extreme, this is a one-shot, jar-based JobService
>>>> API. We
>>>> >>>>>> choose a standard layout of where to put the pipeline
>>>> description and
>>>> >>>>>> artifacts, and can "augment" an existing jar (that has a
>>>> >>>>>> runner-specific main class whose entry point knows how to read
>>>> this
>>>> >>>>>> data to kick off a pipeline as if it were a users driver code)
>>>> into
>>>> >>>>>> one that has a portable pipeline packaged into it for submission
>>>> to a
>>>> >>>>>> cluster.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> It would be nice if the Python developer doesn't have to run
>>>> anything Java at all.
>>>> >>>>>
>>>> >>>>> As we just discussed offline, this could be accomplished by
>>>> including the proto that is produced by the SDK into the pre-existing jar.
>>>> >>>>>
>>>> >>>>> And if the jar has an entry point that creates the Flink job in
>>>> the prescribed manner [1], it can be directly submitted to the Flink REST
>>>> API. That would allow for Java free client.
>>>> >>>>>
>>>> >>>>> [1]
>>>> https://lists.apache.org/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
>>>> >>>>>
>>>>
>>>

Reply via email to