Re: Launching a Portable Pipeline

Ankur Goenka Wed, 23 May 2018 15:23:18 -0700

Yes, JobService can be implemented by a runner and can be bade available
using an endpoint.
The component reuse is more of a code reuse.


On Wed, May 23, 2018 at 3:14 PM Reuven Lax <re...@google.com> wrote:

>
>
> On Wed, May 23, 2018 at 3:09 PM Ankur Goenka <goe...@google.com> wrote:
>
>> 1. Why JobService is runner specific? Couldn't at least a good part of it
>> be reused given that the runner specific parts are mostly in the
>> translation? or I am missing other reasons?
>>
>> Yes, absolutely. A good chunk of it can be reused. We are reusing a few
>> components from ULR in Flink runner. Calling JobService runner specific
>> gives freedom to runner to have very custom JobService if needed.
>>
>
> So you're suggesting that we should publish common JobService components
> and recommend that runners use them, but that runners are free to build
> something completely custom if they prefer?
>
>>
>> 2. What about authentication and authorisation for production runners ?
>> Once you can use such service to submit/cancel Pipelines is the first
>> thing
>> I can think of abusing.
>>
>> Authentication and authorization is still an unsolved problem. To the
>> best of my knowledge, it is runner specific and any required information
>> should be a part of grpc headers.
>>
>> On Wed, May 23, 2018 at 2:48 PM Ismaël Mejía <ieme...@gmail.com> wrote:
>>
>>> Interesting document, two questions:
>>>
>>> 1. Why JobService is runner specific? Couldn't at least a good part of it
>>> be reused given that the runner specific parts are mostly in the
>>> translation? or I am missing other reasons?
>>>
>>> 2. What about authentication and authorisation for production runners ?
>>> Once you can use such service to submit/cancel Pipelines is the first
>>> thing
>>> I can think of abusing.
>>> On Tue, May 22, 2018 at 9:40 PM Ankur Goenka <goe...@google.com> wrote:
>>>
>>> > Thank you guys for the input.
>>>
>>> > Here is the summary.
>>>
>>> > Responsibility of Beam on Job Management
>>>
>>> > Beam provide a common interface for basic job management operations
>>> called JobService. The supported operations can vary between runners.
>>>
>>>
>>> > What is JobService?
>>>
>>> > JobService is a runner specific component which implements Beams
>>> JobService interface defined here.
>>>
>>>
>>> > What is the life cycle of a JobService?
>>>
>>> > There are 3 scenarios
>>>
>>> > With ULR, JobService is short lived and runs as long as the ULR runs. (
>>> JobService Lifespan ~= Job Lifespan )
>>>
>>> > With Production runners ( Flink, Dataflow etc), JobService can either
>>> be
>>> short lived or long lived. The choice is up to the runner.
>>>
>>> > With Production runners ( Flink, Dataflow etc) without long running
>>> JobService, SDK will spin up a local JobService.
>>>
>>>
>>> > JobService state management
>>>
>>> > The choice of state management is up to JobService implementation. The
>>> basic requirement is that JobService should be able to perform all the
>>> operations with the returned job handle.
>>>
>>> > At the very least it can be the job handle for the underlying runner
>>> job
>>> and JobService will simply proxy actions to the runner using the provided
>>> job handle.
>>>
>>> > A persistent JobService is free to provide a simple string as a
>>> JobHandle. In this case, job handle can only be used with the same job
>>> service.
>>>
>>> > A stateless not persistent JobService can provide a opaque blob
>>> containing all the relevant information about the job. In this case the
>>> job
>>> handle can be used with any instance of JobService with the same code.
>>>
>>>
>>> > JobService code distribution and invocation when JobService is short
>>> lived
>>>
>>> > We will give an easy to run solution using docker. Docker will help in
>>> both executable distribution and providing platform independent binary.
>>>
>>> > We will also give an easy setup script with a supporting document for
>>> users who do not want to use docker on local machine.
>>>
>>>
>>> > Should Flink JobService start a local cluster for testing?
>>>
>>> > Flink JobService will be capable of submitting to a remote Flink
>>> cluster
>>> if an master url is provided else it will execute the pipeline in an
>>> inprocess Flink invocation on the same JVM.
>>>
>>>
>>>
>>>
>>> > On Tue, May 22, 2018 at 12:37 PM Eugene Kirpichov <
>>> kirpic...@google.com>
>>> wrote:
>>>
>>> >> Thanks Ankur, I think there's consensus, so it's probably ready to
>>> share
>>> :)
>>>
>>> >> On Fri, May 18, 2018 at 3:00 PM Ankur Goenka <goe...@google.com>
>>> wrote:
>>>
>>> >>> Thanks for all the input.
>>> >>> I have summarized the discussions at the bottom of the document (
>>> here
>>> ).
>>> >>> Please feel free to provide comments.
>>> >>> Once we agree, I will publish the conclusion on the mailing list.
>>>
>>> >>> On Mon, May 14, 2018 at 1:51 PM Eugene Kirpichov <
>>> kirpic...@google.com>
>>> wrote:
>>>
>>> >>>> Thanks Ankur, this document clarifies a few points and raises some
>>> very important questions. I encourage everybody with a stake in
>>> Portability
>>> to take a look and chime in.
>>>
>>> >>>> +Aljoscha Krettek +Thomas Weise +Henning Rohde
>>>
>>> >>>> On Mon, May 14, 2018 at 12:34 PM Ankur Goenka <goe...@google.com>
>>> wrote:
>>>
>>> >>>>> Updated link to the document as the previous link was not working
>>> for
>>> some people.
>>>
>>>
>>> >>>>> On Fri, May 11, 2018 at 7:56 PM Ankur Goenka <goe...@google.com>
>>> wrote:
>>>
>>> >>>>>> Hi,
>>>
>>> >>>>>> Recent effort on portability has introduced JobService and
>>> ArtifactService to the beam stack along with SDK. This has open up a few
>>> questions around how we start a pipeline in a portable setup (with
>>> JobService).
>>> >>>>>> I am trying to document our approach to launching a portable
>>> pipeline and take binding decisions based on the discussion.
>>> >>>>>> Please review the document and provide your feedback.
>>>
>>> >>>>>> Thanks,
>>> >>>>>> Ankur
>>>
>>

Re: Launching a Portable Pipeline

Reply via email to