Hi! I’m Bill. I’m working with Lukasz on defining the Fn API, which will
provide guidance on the boundary between runner-specific code, and
constructs the SDK implements.
Here's our high level framing of the problem, and the type of problems we
want to address.

Concept:

Decouple runner developers from SDK authors by having a clean separation
between the execution of user definable functions (DoFns and others)
marshaled by SDK, and the responsibilities of the runner in providing a
distributed execution framework for the parallel execution of those
functions.

The vision that compels us is providing a minimal standard that enables
SDKs in all languages, and provide non-binding implementation guidance for
runner authors to help provide additional seams between the SDK and runner,
trading off increased implementation complexity for better performance
opportunities.

The concept characterizes the interface in 3 dimensions: the data format
describing function invocations and their results and side effects, which
bridges the runner and SDK; the transport of bytes that moves this data
between runner and SDK, and a specification of the execution environment in
which the SDK code runs.

Data Format: A UDF is called with arguments containing the data of its
input arguments. The UDF returns the data that is associated with the named
outputs, and additional data for counter values and TBD.

Transport: As the SDK implementation language may differ from the runner’s
language, there are different options for moving the function invocation
data between the two. SWIG is a common technique for cross-language
binding. We are investigating using GRPC as a transport, since servers and
clients exist for it in many languages, providing a useful set of potential
targets for SDK development that would not be reachable via SWIG.

Execution environment: While less of a concern for certain environments,
other runners execute in dynamically instantiated environments that may
require configuration/provisioning to support the execution of arbitrary
user code. Examples include Python code expecting dependencies to be
installed, user code expecting files staged on the local filesystem, and so
on. We are exploring using Docker containers as a mechanism for specifying
these execution environments. They are advantageous in that they solve the
deployment issues described above, and are admissible to the various
implementations of the Beam model today.

I’ll be sharing my thinking as it moves forward, but wanted to touch base
and invite comments and thoughts.

Reply via email to