On Wed, May 9, 2018 at 1:08 AM Romain Manni-Bucau <rmannibu...@gmail.com> wrote:
> > > Le mer. 9 mai 2018 00:57, Henning Rohde <hero...@google.com> a écrit : > >> There are indeed lots of possibilities for interesting docker >> alternatives with different tradeoffs and capabilities, but in generally >> both the runner as well as the SDK must support them for it to work. As >> mentioned, docker (as used in the container contract) is meant as a >> flexible main option but not necessarily the only option. I see no problem >> with certain pipeline-SDK-runner combinations additionally supporting a >> specialized setup. Pipeline can be a factor, because that some transforms >> might depend on aspects of the runtime environment -- such as system >> libraries or shelling out to a /bin/foo. >> >> The worker boot code is tied to the current container contract, so >> pre-launched workers would presumably not use that code path and are not be >> bound by its assumptions. In particular, such a setup might want to invert >> who initiates the connection from the SDK worker to the runner. Pipeline >> options and global state in the SDK and user functions process might make >> it difficult to safely reuse worker processes across pipelines, but also >> doable in certain scenarios. >> > > This is not that hard actually and most java env do it. > > Main concern is 1. Being tight to an impl detail and 2. A bad architecture > which doeent embrace the community > Could you please be more specific? Concerns about Docker dependency have already been repeatedly addressed in this thread. > > > >> Henning >> >> On Tue, May 8, 2018 at 3:51 PM Thomas Weise <t...@apache.org> wrote: >> >>> >>> >>> On Sat, May 5, 2018 at 3:58 PM, Robert Bradshaw <rober...@google.com> >>> wrote: >>> >>>> >>>> I would welcome changes to >>>> >>>> https://github.com/apache/beam/blob/v2.4.0/model/pipeline/src/main/proto/beam_runner_api.proto#L730 >>>> that would provide alternatives to docker (one of which comes to mind >>>> is "I >>>> already brought up a worker(s) for you (which could be the same process >>>> that handled pipeline construction in testing scenarios), here's how to >>>> connect to it/them.") Another option, which would seem to appeal to you >>>> in >>>> particular, would be "the worker code is linked into the runner's >>>> binary, >>>> use this process as the worker" (though note even for java-on-java, it >>>> can >>>> be advantageous to shield the worker and runner code from each others >>>> environments, dependencies, and version requirements.) This latter >>>> should >>>> still likely use the FnApi to talk to itself (either over GRPC on local >>>> ports, or possibly better via direct function calls eliminating the RPC >>>> overhead altogether--this is how the fast local runner in Python works). >>>> There may be runner environments well controlled enough that "start up >>>> the >>>> workers" could be specified as "run this command line." We should make >>>> this >>>> environment message extensible to other alternatives than "docker >>>> container >>>> url," though of course we don't want the set of options to grow too >>>> large >>>> or we loose the promise of portability unless every runner supports >>>> every >>>> protocol. >>>> >>>> >>> The pre-launched worker would be an interesting option, which might work >>> well for a sidecar deployment. >>> >>> The current worker boot code though makes the assumption that the runner >>> endpoint to phone home to is known when the process is launched. That >>> doesn't work so well with a runner that establishes its endpoint >>> dynamically. Also, the assumption is baked in that a worker will only serve >>> a single pipeline (provisioning API etc.). >>> >>> Thanks, >>> Thomas >>> >>> >>