Hi! I’m Bill. I’m working with Lukasz on defining the Fn API, which will provide guidance on the boundary between runner-specific code, and constructs the SDK implements. Here's our high level framing of the problem, and the type of problems we want to address.
Concept: Decouple runner developers from SDK authors by having a clean separation between the execution of user definable functions (DoFns and others) marshaled by SDK, and the responsibilities of the runner in providing a distributed execution framework for the parallel execution of those functions. The vision that compels us is providing a minimal standard that enables SDKs in all languages, and provide non-binding implementation guidance for runner authors to help provide additional seams between the SDK and runner, trading off increased implementation complexity for better performance opportunities. The concept characterizes the interface in 3 dimensions: the data format describing function invocations and their results and side effects, which bridges the runner and SDK; the transport of bytes that moves this data between runner and SDK, and a specification of the execution environment in which the SDK code runs. Data Format: A UDF is called with arguments containing the data of its input arguments. The UDF returns the data that is associated with the named outputs, and additional data for counter values and TBD. Transport: As the SDK implementation language may differ from the runner’s language, there are different options for moving the function invocation data between the two. SWIG is a common technique for cross-language binding. We are investigating using GRPC as a transport, since servers and clients exist for it in many languages, providing a useful set of potential targets for SDK development that would not be reachable via SWIG. Execution environment: While less of a concern for certain environments, other runners execute in dynamically instantiated environments that may require configuration/provisioning to support the execution of arbitrary user code. Examples include Python code expecting dependencies to be installed, user code expecting files staged on the local filesystem, and so on. We are exploring using Docker containers as a mechanism for specifying these execution environments. They are advantageous in that they solve the deployment issues described above, and are admissible to the various implementations of the Beam model today. I’ll be sharing my thinking as it moves forward, but wanted to touch base and invite comments and thoughts.
