Re: Beam Fn API

Lukasz Cwik Fri, 20 Jan 2017 15:53:10 -0800

I updated the PR description to contain the same.

I would start by looking at the API/object model definitions found in
beam_fn_api.proto
<https://github.com/lukecwik/incubator-beam/blob/fn_api/sdks/common/fn-api/src/main/proto/beam_fn_api.proto>


Then depending on your interest, look at the following:
* FnHarness.java
<https://github.com/lukecwik/incubator-beam/blob/fn_api/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java>
is the main entry point.
* org.apache.beam.fn.harness.data
<https://github.com/lukecwik/incubator-beam/tree/fn_api/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data>
contains the most interesting bits of code since it is able to multiplex a
gRPC stream into multiple logical streams of elements bound for multiple
concurrent process bundle requests. It also contains the code to take
multiple logical outbound streams and multiplex them back onto a gRPC
stream.
* org.apache.beam.runners.core
<https://github.com/lukecwik/incubator-beam/tree/fn_api/sdks/java/harness/src/main/java/org/apache/beam/runners/core>
contains additional runners akin to the DoFnRunner found in runners-core to
support sources and gRPC endpoints.

Unless your really interested in how domain sockets, epoll, nio channel
factories or how stream readiness callbacks work in gRPC, I would avoid the
packages org.apache.beam.fn.harness.channel and
org.apache.beam.fn.harness.stream. Similarly I would avoid
org.apache.beam.fn.harness.fn and org.apache.beam.fn.harness.fake as they
don't add anything meaningful to the api.

Code package descriptions:

org.apache.beam.fn.harness.FnHarness: main entry point
org.apache.beam.fn.harness.control: Control service client and individual
request handlers
org.apache.beam.fn.harness.data: Data service client and logical stream
multiplexing
org.apache.beam.runners.core: Additional runners akin to the DoFnRunner
found in runners-core to support sources and gRPC endpoints
org.apache.beam.fn.harness.logging: Logging client implementation and JUL
logging handler adapter
org.apache.beam.fn.harness.channel: gRPC channel management
org.apache.beam.fn.harness.stream: gRPC stream management
org.apache.beam.fn.harness.fn: Java 8 functional interface extensions


On Fri, Jan 20, 2017 at 1:26 PM, Kenneth Knowles <[email protected]>
wrote:

> This is awesome! Any chance you could roadmap the PR for us with some links
> into the most interesting bits?
>
> On Fri, Jan 20, 2017 at 12:19 PM, Robert Bradshaw <
> [email protected]> wrote:
>
> > Also, note that we can still support the "simple" case. For example,
> > if the user supplies us with a jar file (as they do now) a runner
> > could launch it as a subprocesses and communicate with it via this
> > same Fn API or install it in a fixed container itself--the user
> > doesn't *need* to know about docker or manually manage containers (and
> > indeed the Fn API could be used in-process, cross-process,
> > cross-container, and even cross-machine).
> >
> > However docker provides a nice cross-language way of specifying the
> > environment including all dependencies (especially for languages like
> > Python or C where the equivalent of a cross-platform, self-contained
> > jar isn't as easy to produce) and is strictly more powerful and
> > flexible (specifically it isolates the runtime environment and one can
> > even use it for local testing).
> >
> > Slicing a worker up like this without sacrificing performance is an
> > ambitious goal, but essential to the story of being able to mix and
> > match runners and SDKs arbitrarily, and I think this is a great start.
> >
> >
> > On Fri, Jan 20, 2017 at 9:39 AM, Lukasz Cwik <[email protected]>
> > wrote:
> > > Your correct, a docker container is created that contains the execution
> > > environment the user wants or the user re-uses an existing one
> (allowing
> > > for a user to embed all their code/dependencies or use a container that
> > can
> > > deploy code/dependencies on demand).
> > > A user creates a pipeline saying which docker container they want to
> use
> > > (this starts to allow for multiple container definitions within a
> single
> > > pipeline to support multiple languages, versioning, ...).
> > > A runner would then be responsible for launching one or more of these
> > > containers in a cluster manager of their choice (scaling up or down the
> > > number of instances depending on demand/load/...).
> > > A runner then interacts with the docker containers over the gRPC
> service
> > > definitions to delegate processing to.
> > >
> > >
> > > On Fri, Jan 20, 2017 at 4:56 AM, Jean-Baptiste Onofré <[email protected]
> >
> > > wrote:
> > >
> > >> Hi Luke,
> > >>
> > >> that's really great and very promising !
> > >>
> > >> It's really ambitious but I like the idea. Just to clarify: the
> purpose
> > of
> > >> using gRPC is once the docker container is running, then we can
> > "interact"
> > >> with the container to spread and delegate processing to the docker
> > >> container, correct ?
> > >> The users/devops have to setup the docker containers as prerequisite.
> > >> Then, the "location" of the containers (kind of container registry) is
> > set
> > >> via the pipeline options and used by gRPC ?
> > >>
> > >> Thanks Luke !
> > >>
> > >> Regards
> > >> JB
> > >>
> > >>
> > >> On 01/19/2017 03:56 PM, Lukasz Cwik wrote:
> > >>
> > >>> I have been prototyping several components towards the Beam technical
> > >>> vision of being able to execute an arbitrary language using an
> > arbitrary
> > >>> runner.
> > >>>
> > >>> I would like to share this overview [1] of what I have been working
> > >>> towards. I also share this PR [2] with a proposed API, service
> > definitions
> > >>> and partial implementation.
> > >>>
> > >>> 1: https://s.apache.org/beam-fn-api
> > >>> 2: https://github.com/apache/beam/pull/1801
> > >>>
> > >>> Please comment on the overview within this thread, and any specific
> > code
> > >>> comments on the PR directly.
> > >>>
> > >>> Luke
> > >>>
> > >>>
> > >> --
> > >> Jean-Baptiste Onofré
> > >> [email protected]
> > >> http://blog.nanthrax.net
> > >> Talend - http://www.talend.com
> > >>
> >
>

Re: Beam Fn API

Reply via email to