Hello, I have been intrigued by the Fn API since the first time I heard about it, really interesting indeed, maybe you can put your ideas in a google doc, so it can be part of the Beam Technical Docs dir (and commented by others).
Ismaël On Tue, Jun 21, 2016 at 4:36 PM, Bill Neubauer <[email protected]> wrote: > Hi! I’m Bill. I’m working with Lukasz on defining the Fn API, which will > provide guidance on the boundary between runner-specific code, and > constructs the SDK implements. > Here's our high level framing of the problem, and the type of problems we > want to address. > > Concept: > > Decouple runner developers from SDK authors by having a clean separation > between the execution of user definable functions (DoFns and others) > marshaled by SDK, and the responsibilities of the runner in providing a > distributed execution framework for the parallel execution of those > functions. > > The vision that compels us is providing a minimal standard that enables > SDKs in all languages, and provide non-binding implementation guidance for > runner authors to help provide additional seams between the SDK and runner, > trading off increased implementation complexity for better performance > opportunities. > > The concept characterizes the interface in 3 dimensions: the data format > describing function invocations and their results and side effects, which > bridges the runner and SDK; the transport of bytes that moves this data > between runner and SDK, and a specification of the execution environment in > which the SDK code runs. > > Data Format: A UDF is called with arguments containing the data of its > input arguments. The UDF returns the data that is associated with the named > outputs, and additional data for counter values and TBD. > > Transport: As the SDK implementation language may differ from the runner’s > language, there are different options for moving the function invocation > data between the two. SWIG is a common technique for cross-language > binding. We are investigating using GRPC as a transport, since servers and > clients exist for it in many languages, providing a useful set of potential > targets for SDK development that would not be reachable via SWIG. > > Execution environment: While less of a concern for certain environments, > other runners execute in dynamically instantiated environments that may > require configuration/provisioning to support the execution of arbitrary > user code. Examples include Python code expecting dependencies to be > installed, user code expecting files staged on the local filesystem, and so > on. We are exploring using Docker containers as a mechanism for specifying > these execution environments. They are advantageous in that they solve the > deployment issues described above, and are admissible to the various > implementations of the Beam model today. > > I’ll be sharing my thinking as it moves forward, but wanted to touch base > and invite comments and thoughts. >
