Also, note that we can still support the "simple" case. For example, if the user supplies us with a jar file (as they do now) a runner could launch it as a subprocesses and communicate with it via this same Fn API or install it in a fixed container itself--the user doesn't *need* to know about docker or manually manage containers (and indeed the Fn API could be used in-process, cross-process, cross-container, and even cross-machine).
However docker provides a nice cross-language way of specifying the environment including all dependencies (especially for languages like Python or C where the equivalent of a cross-platform, self-contained jar isn't as easy to produce) and is strictly more powerful and flexible (specifically it isolates the runtime environment and one can even use it for local testing). Slicing a worker up like this without sacrificing performance is an ambitious goal, but essential to the story of being able to mix and match runners and SDKs arbitrarily, and I think this is a great start. On Fri, Jan 20, 2017 at 9:39 AM, Lukasz Cwik <[email protected]> wrote: > Your correct, a docker container is created that contains the execution > environment the user wants or the user re-uses an existing one (allowing > for a user to embed all their code/dependencies or use a container that can > deploy code/dependencies on demand). > A user creates a pipeline saying which docker container they want to use > (this starts to allow for multiple container definitions within a single > pipeline to support multiple languages, versioning, ...). > A runner would then be responsible for launching one or more of these > containers in a cluster manager of their choice (scaling up or down the > number of instances depending on demand/load/...). > A runner then interacts with the docker containers over the gRPC service > definitions to delegate processing to. > > > On Fri, Jan 20, 2017 at 4:56 AM, Jean-Baptiste Onofré <[email protected]> > wrote: > >> Hi Luke, >> >> that's really great and very promising ! >> >> It's really ambitious but I like the idea. Just to clarify: the purpose of >> using gRPC is once the docker container is running, then we can "interact" >> with the container to spread and delegate processing to the docker >> container, correct ? >> The users/devops have to setup the docker containers as prerequisite. >> Then, the "location" of the containers (kind of container registry) is set >> via the pipeline options and used by gRPC ? >> >> Thanks Luke ! >> >> Regards >> JB >> >> >> On 01/19/2017 03:56 PM, Lukasz Cwik wrote: >> >>> I have been prototyping several components towards the Beam technical >>> vision of being able to execute an arbitrary language using an arbitrary >>> runner. >>> >>> I would like to share this overview [1] of what I have been working >>> towards. I also share this PR [2] with a proposed API, service definitions >>> and partial implementation. >>> >>> 1: https://s.apache.org/beam-fn-api >>> 2: https://github.com/apache/beam/pull/1801 >>> >>> Please comment on the overview within this thread, and any specific code >>> comments on the PR directly. >>> >>> Luke >>> >>> >> -- >> Jean-Baptiste Onofré >> [email protected] >> http://blog.nanthrax.net >> Talend - http://www.talend.com >>
