On Wed, May 9, 2018 at 1:08 AM Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

>
>
> Le mer. 9 mai 2018 00:57, Henning Rohde <hero...@google.com> a écrit :
>
>> There are indeed lots of possibilities for interesting docker
>> alternatives with different tradeoffs and capabilities, but in generally
>> both the runner as well as the SDK must support them for it to work. As
>> mentioned, docker (as used in the container contract) is meant as a
>> flexible main option but not necessarily the only option. I see no problem
>> with certain pipeline-SDK-runner combinations additionally supporting a
>> specialized setup. Pipeline can be a factor, because that some transforms
>> might depend on aspects of the runtime environment -- such as system
>> libraries or shelling out to a /bin/foo.
>>
>> The worker boot code is tied to the current container contract, so
>> pre-launched workers would presumably not use that code path and are not be
>> bound by its assumptions. In particular, such a setup might want to invert
>> who initiates the connection from the SDK worker to the runner. Pipeline
>> options and global state in the SDK and user functions process might make
>> it difficult to safely reuse worker processes across pipelines, but also
>> doable in certain scenarios.
>>
>
> This is not that hard actually and most java env do it.
>
> Main concern is 1. Being tight to an impl detail and 2. A bad architecture
> which doeent embrace the community
>
Could you please be more specific? Concerns about Docker dependency have
already been repeatedly addressed in this thread.


>
>
>
>> Henning
>>
>> On Tue, May 8, 2018 at 3:51 PM Thomas Weise <t...@apache.org> wrote:
>>
>>>
>>>
>>> On Sat, May 5, 2018 at 3:58 PM, Robert Bradshaw <rober...@google.com>
>>> wrote:
>>>
>>>>
>>>> I would welcome changes to
>>>>
>>>> https://github.com/apache/beam/blob/v2.4.0/model/pipeline/src/main/proto/beam_runner_api.proto#L730
>>>> that would provide alternatives to docker (one of which comes to mind
>>>> is "I
>>>> already brought up a worker(s) for you (which could be the same process
>>>> that handled pipeline construction in testing scenarios), here's how to
>>>> connect to it/them.") Another option, which would seem to appeal to you
>>>> in
>>>> particular, would be "the worker code is linked into the runner's
>>>> binary,
>>>> use this process as the worker" (though note even for java-on-java, it
>>>> can
>>>> be advantageous to shield the worker and runner code from each others
>>>> environments, dependencies, and version requirements.) This latter
>>>> should
>>>> still likely use the FnApi to talk to itself (either over GRPC on local
>>>> ports, or possibly better via direct function calls eliminating the RPC
>>>> overhead altogether--this is how the fast local runner in Python works).
>>>> There may be runner environments well controlled enough that "start up
>>>> the
>>>> workers" could be specified as "run this command line." We should make
>>>> this
>>>> environment message extensible to other alternatives than "docker
>>>> container
>>>> url," though of course we don't want the set of options to grow too
>>>> large
>>>> or we loose the promise of portability unless every runner supports
>>>> every
>>>> protocol.
>>>>
>>>>
>>> The pre-launched worker would be an interesting option, which might work
>>> well for a sidecar deployment.
>>>
>>> The current worker boot code though makes the assumption that the runner
>>> endpoint to phone home to is known when the process is launched. That
>>> doesn't work so well with a runner that establishes its endpoint
>>> dynamically. Also, the assumption is baked in that a worker will only serve
>>> a single pipeline (provisioning API etc.).
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>

Reply via email to