Re: (mini-doc) Beam (Flink) portable job templates

Thomas Weise Tue, 20 Aug 2019 10:11:26 -0700

On Tue, Aug 20, 2019 at 8:56 AM Lukasz Cwik <[email protected]> wrote:


>
>
> On Mon, Aug 19, 2019 at 5:52 PM Ahmet Altay <[email protected]> wrote:
>
>>
>>
>> On Sun, Aug 18, 2019 at 12:34 PM Thomas Weise <[email protected]> wrote:
>>
>>> There is a PR open for this: https://github.com/apache/beam/pull/9331
>>>
>>> (it wasn't tagged with the JIRA and therefore not linked)
>>>
>>> I think it is worthwhile to explore how we could further detangle the
>>> client side Python and Java dependencies.
>>>
>>> The expansion service is one more dependency to consider in a build
>>> environment. Is it really necessary to expand external transforms prior to
>>> submission to the job service?
>>>
>>
>> +1, this will make it easier to use external transforms from the already
>> familiar client environments.
>>
>>
>
> The intent is to make it so that you CAN (not MUST) run an expansion
> service separate from a Runner. Creating a single endpoint that hosts both
> the Job and Expansion service is something that gRPC does very easily since
> you can host multiple service definitions on a single port.
>

Yes, that's fine. The point here is when the expansion occurs. I believe
the runner can also invoke the expansion service, thereby eliminating the
expansion service interaction from the client side.



>
>
>>
>>> Can we come up with a partially constructed proto that can be produced
>>> by just running the Python entry point? Note this would also require
>>> pushing the pipeline options parsing into the job service.
>>>
>>
>> Why would this require pushing the pipeline options parsing to the job
>> service. Assuming that python will have enough idea about the external
>> transform what options it will need. The necessary bit could be converted
>> to arguments and be part of that partially constructed proto.
>>
>>
>>>
>>> On Sun, Aug 18, 2019 at 12:01 PM enrico canzonieri <
>>> [email protected]> wrote:
>>>
>>>> I found the tracking ticket at BEAM-7966
>>>> <https://jira.apache.org/jira/browse/BEAM-7966>
>>>>
>>>> On Sun, Aug 18, 2019 at 11:59 AM enrico canzonieri <
>>>> [email protected]> wrote:
>>>>
>>>>> Is this alternative still being considered? Creating a portable jar
>>>>> sounds like a good solution to re-use the existing runner specific
>>>>> deployment mechanism (e.g. Flink k8s operator) and in general simplify the
>>>>> deployment story.
>>>>>
>>>>> On Fri, Aug 9, 2019 at 12:46 AM Robert Bradshaw <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> The expansion service is a separate service. (The flink jar happens to
>>>>>> bring both up.) However, there is negotiation to receive/validate the
>>>>>> pipeline options.
>>>>>>
>>>>>> On Fri, Aug 9, 2019 at 1:54 AM Thomas Weise <[email protected]> wrote:
>>>>>> >
>>>>>> > We would also need to consider cross-language pipelines that
>>>>>> (currently) assume the interaction with an expansion service at
>>>>>> construction time.
>>>>>> >
>>>>>> > On Thu, Aug 8, 2019, 4:38 PM Kyle Weaver <[email protected]>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> > It might also be useful to have the option to just output the
>>>>>> proto and artifacts, as alternative to the jar file.
>>>>>> >>
>>>>>> >> Sure, that wouldn't be too big a change if we were to decide to go
>>>>>> the SDK route.
>>>>>> >>
>>>>>> >> > For the Flink entry point we would need to allow for the job
>>>>>> server to be used as a library.
>>>>>> >>
>>>>>> >> We don't need the whole job server, we only need to add a main
>>>>>> method to FlinkPipelineRunner [1] as the entry point, which would 
>>>>>> basically
>>>>>> just do the setup described in the doc then call 
>>>>>> FlinkPipelineRunner::run.
>>>>>> >>
>>>>>> >> [1]
>>>>>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java#L53
>>>>>> >>
>>>>>> >> Kyle Weaver | Software Engineer | github.com/ibzib |
>>>>>> [email protected]
>>>>>> >>
>>>>>> >>
>>>>>> >> On Thu, Aug 8, 2019 at 4:21 PM Thomas Weise <[email protected]>
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>> Hi Kyle,
>>>>>> >>>
>>>>>> >>> It might also be useful to have the option to just output the
>>>>>> proto and artifacts, as alternative to the jar file.
>>>>>> >>>
>>>>>> >>> For the Flink entry point we would need to allow for the job
>>>>>> server to be used as a library. It would probably not be too hard to have
>>>>>> the Flink job constructed via the context execution environment, which
>>>>>> would require no changes on the Flink side.
>>>>>> >>>
>>>>>> >>> Thanks,
>>>>>> >>> Thomas
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> On Thu, Aug 8, 2019 at 9:52 AM Kyle Weaver <[email protected]>
>>>>>> wrote:
>>>>>> >>>>
>>>>>> >>>> Re Javaless/serverless solution:
>>>>>> >>>> I take it this would probably mean that we would construct the
>>>>>> jar directly from the SDK. There are advantages to this: full separation 
>>>>>> of
>>>>>> Python and Java environments, no need for a job server, and likely a
>>>>>> simpler implementation, since we'd no longer have to work within the
>>>>>> constraints of the existing job server infrastructure. The only downside 
>>>>>> I
>>>>>> can think of is the additional cost of implementing/maintaining jar
>>>>>> creation code in each SDK, but that cost may be acceptable if it's simple
>>>>>> enough.
>>>>>> >>>>
>>>>>> >>>> Kyle Weaver | Software Engineer | github.com/ibzib |
>>>>>> [email protected]
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> On Thu, Aug 8, 2019 at 9:31 AM Thomas Weise <[email protected]>
>>>>>> wrote:
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> On Thu, Aug 8, 2019 at 8:29 AM Robert Bradshaw <
>>>>>> [email protected]> wrote:
>>>>>> >>>>>>
>>>>>> >>>>>> > Before assembling the jar, the job server runs to create the
>>>>>> ingredients. That requires the (matching) Java environment on the Python
>>>>>> developers machine.
>>>>>> >>>>>>
>>>>>> >>>>>> We can run the job server and have it create the jar (and if
>>>>>> we keep
>>>>>> >>>>>> the job server running we can use it to interact with the
>>>>>> running
>>>>>> >>>>>> job). However, if the jar layout is simple enough, there's no
>>>>>> need to
>>>>>> >>>>>> even build it from Java.
>>>>>> >>>>>>
>>>>>> >>>>>> Taken to the extreme, this is a one-shot, jar-based JobService
>>>>>> API. We
>>>>>> >>>>>> choose a standard layout of where to put the pipeline
>>>>>> description and
>>>>>> >>>>>> artifacts, and can "augment" an existing jar (that has a
>>>>>> >>>>>> runner-specific main class whose entry point knows how to read
>>>>>> this
>>>>>> >>>>>> data to kick off a pipeline as if it were a users driver code)
>>>>>> into
>>>>>> >>>>>> one that has a portable pipeline packaged into it for
>>>>>> submission to a
>>>>>> >>>>>> cluster.
>>>>>> >>>>>
>>>>>> >>>>>
>>>>>> >>>>> It would be nice if the Python developer doesn't have to run
>>>>>> anything Java at all.
>>>>>> >>>>>
>>>>>> >>>>> As we just discussed offline, this could be accomplished by
>>>>>> including the proto that is produced by the SDK into the pre-existing 
>>>>>> jar.
>>>>>> >>>>>
>>>>>> >>>>> And if the jar has an entry point that creates the Flink job in
>>>>>> the prescribed manner [1], it can be directly submitted to the Flink REST
>>>>>> API. That would allow for Java free client.
>>>>>> >>>>>
>>>>>> >>>>> [1]
>>>>>> https://lists.apache.org/thread.html/6db869c53816f4e2917949a7c6992c2b90856d7d639d7f2e1cd13768@%3Cdev.flink.apache.org%3E
>>>>>> >>>>>
>>>>>>
>>>>>

Re: (mini-doc) Beam (Flink) portable job templates

Reply via email to