Also, note that we can still support the "simple" case. For example,
if the user supplies us with a jar file (as they do now) a runner
could launch it as a subprocesses and communicate with it via this
same Fn API or install it in a fixed container itself--the user
doesn't *need* to know about docker or manually manage containers (and
indeed the Fn API could be used in-process, cross-process,
cross-container, and even cross-machine).

However docker provides a nice cross-language way of specifying the
environment including all dependencies (especially for languages like
Python or C where the equivalent of a cross-platform, self-contained
jar isn't as easy to produce) and is strictly more powerful and
flexible (specifically it isolates the runtime environment and one can
even use it for local testing).

Slicing a worker up like this without sacrificing performance is an
ambitious goal, but essential to the story of being able to mix and
match runners and SDKs arbitrarily, and I think this is a great start.


On Fri, Jan 20, 2017 at 9:39 AM, Lukasz Cwik <[email protected]> wrote:
> Your correct, a docker container is created that contains the execution
> environment the user wants or the user re-uses an existing one (allowing
> for a user to embed all their code/dependencies or use a container that can
> deploy code/dependencies on demand).
> A user creates a pipeline saying which docker container they want to use
> (this starts to allow for multiple container definitions within a single
> pipeline to support multiple languages, versioning, ...).
> A runner would then be responsible for launching one or more of these
> containers in a cluster manager of their choice (scaling up or down the
> number of instances depending on demand/load/...).
> A runner then interacts with the docker containers over the gRPC service
> definitions to delegate processing to.
>
>
> On Fri, Jan 20, 2017 at 4:56 AM, Jean-Baptiste Onofré <[email protected]>
> wrote:
>
>> Hi Luke,
>>
>> that's really great and very promising !
>>
>> It's really ambitious but I like the idea. Just to clarify: the purpose of
>> using gRPC is once the docker container is running, then we can "interact"
>> with the container to spread and delegate processing to the docker
>> container, correct ?
>> The users/devops have to setup the docker containers as prerequisite.
>> Then, the "location" of the containers (kind of container registry) is set
>> via the pipeline options and used by gRPC ?
>>
>> Thanks Luke !
>>
>> Regards
>> JB
>>
>>
>> On 01/19/2017 03:56 PM, Lukasz Cwik wrote:
>>
>>> I have been prototyping several components towards the Beam technical
>>> vision of being able to execute an arbitrary language using an arbitrary
>>> runner.
>>>
>>> I would like to share this overview [1] of what I have been working
>>> towards. I also share this PR [2] with a proposed API, service definitions
>>> and partial implementation.
>>>
>>> 1: https://s.apache.org/beam-fn-api
>>> 2: https://github.com/apache/beam/pull/1801
>>>
>>> Please comment on the overview within this thread, and any specific code
>>> comments on the PR directly.
>>>
>>> Luke
>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> [email protected]
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>

Reply via email to