Re: Bootstrapping Beam's Job Server

Maximilian Michels Mon, 27 Aug 2018 02:34:26 -0700

Understood, so that's a generalized abstraction for creating RPC-basedservices that manage SDK harnesses. (What we discussed as "external" inthe other thread). Would prefer this REST-based, since this makesinterfacing with other systems easier. So probably a shell script wouldalready suffice.


On 27.08.18 11:23, Robert Bradshaw wrote:

I mean that rather than a command line (or docker image) a URL is
given that's a GRPC (or REST or ...) endpoint that's invoked to pass
what would have been passed by command line arguments (e.g. the FnAPI
control plane and logging endpoints).


This could be implemented as a script that goes and makes the call and
exits, but I think this would be common enough it'd be worth building
in, and also useful enough for testing that it should be very
lightweight.
On Mon, Aug 27, 2018 at 10:51 AM Maximilian Michels <[email protected]> wrote:


Robert, just to be clear about the "callback" proposal. Do you mean that
the process startup script listens for an RPC from the Runner to bring
up SDK harnesses as needed?

I agree this would be helpful to know the required parameter, e.g. you
mentioned the Fn Api network configuration.

On 23.08.18 17:07, Robert Bradshaw wrote:

On Thu, Aug 23, 2018 at 3:47 PM Maximilian Michels <[email protected]> wrote:


   > Going down this path may start to get fairly involved, with an almost
   > endless list of features that could be requested. Instead, I would
   > suggest we keep process-based execution very simple, and specify bash
   > script (that sets up the environment and whatever else one may want to
   > do) as the command line invocation.

Fair point. At the least, we will have to transfer the shell script to
the nodes. Anything else is up to the script.

   > I would also think it'd be really valuable to provide a "callback"
   > environment, where an RPC call is made to trigger worker creation
   > (deletion?), passing the requisite parameters (e.g. the fn api
   > endpoints).

Aren't you making up more features now? :) Couldn't this be also handled
by the shell script?


Good point :). I still think it'd be nice to make this option more
explicit, as it doesn't even require starting up (or managing) a
subprocess.

On 23.08.18 14:13, Robert Bradshaw wrote:

On Thu, Aug 23, 2018 at 1:54 PM Maximilian Michels <[email protected]> wrote:


Big +1. Process-based execution should be simple to reason about for
users.


+1. In fact, this is exactly what the Python local job server does,
with running Docker simply being a particular command line that's
passed down here.

https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/local_job_service_main.py

The implementation should not be too involved. The user has to
ensure the environment is suitable for process-based execution.

There are some minor features that we should support:

- Activating a virtual environment for Python / Adding pre-installed
libraries to the classpath

- Staging libraries, similarly to the boot code for Docker


Going down this path may start to get fairly involved, with an almost
endless list of features that could be requested. Instead, I would
suggest we keep process-based execution very simple, and specify bash
script (that sets up the environment and whatever else one may want to
do) as the command line invocation. We could even provide a couple of
these. (The arguments to pass should be configurable).

I would also think it'd be really valuable to provide a "callback"
environment, where an RPC call is made to trigger worker creation
(deletion?), passing the requisite parameters (e.g. the fn api
endpoints). This could be useful both in a distributed system (where
it may be desirable for an external entity to actually start up the
workers) or for debugging/testing (where one could call into the same
process that submitted the job, which would execute workers on
separate threads with an already set up environment).

On 22.08.18 07:49, Henning Rohde wrote:

Agree with Luke. Perhaps something simple, prescriptive yet flexible,
such as custom command line (defined in the environment proto) rooted at
the base of the provided artifacts and either passed the same arguments
or defined in the container contract or made available through
substitution. That way, all the restrictions/assumptions of the
execution environment become implicit and runner/deployment dependent.


On Tue, Aug 21, 2018 at 2:12 PM Lukasz Cwik <[email protected]
<mailto:[email protected]>> wrote:

       I believe supporting a simple Process environment makes sense. It
       would be best if we didn't make the Process route solve all the
       problems that Docker solves for us. In my opinion we should limit
       the Process route to assume that the execution environment:
       * has all dependencies and libraries installed
       * is of a compatible machine architecture
       * doesn't require special networking rules to be setup

       Any other suggestions for reasonable limits on a Process environment?

       On Tue, Aug 21, 2018 at 2:53 AM Ismaël Mejía <[email protected]
       <mailto:[email protected]>> wrote:

           It is also worth to mention that apart of the
           testing/development use
           case there is also the case of supporting people running in Hadoop
           distributions. There are two extra reasons to want a process based
           version: (1) Some Hadoop distributions run in machines with
           really old
           kernels where docker support is limited or nonexistent (yes, some of
           those run on kernel 2.6!) and (2) Ops people may be reticent to the
           additional operational overhead of enabling docker in their
           clusters.
           On Tue, Aug 21, 2018 at 11:50 AM Maximilian Michels
           <[email protected] <mailto:[email protected]>> wrote:
            >
            > Thanks Henning and Thomas. It looks like
            >
            > a) we want to keep the Docker Job Server Docker container and
           rely on
            > spinning up "sibling" SDK harness containers via the Docker
           socket. This
            > should require little changes to the Runner code.
            >
            > b) have the InProcess SDK harness as an alternative way to
           running user
            > code. This can be done independently of a).
            >
            > Thomas, let's sync today on the InProcess SDK harness. I've
           created a
            > JIRA issue: https://issues.apache.org/jira/browse/BEAM-5187
            >
            > Cheers,
            > Max
            >
            > On 21.08.18 00:35, Thomas Weise wrote:
            > > The original objective was to make test/development easier
           (which I
            > > think is super important for user experience with portable
           runner).
            > >
            > >  From first hand experience I can confirm that dealing with
           Flink
            > > clusters and Docker containers for local setup is a
           significant hurdle
            > > for Python developers.
            > >
            > > To simplify using Flink in embedded mode, the (direct)
           process based SDK
            > > harness would be a good option, especially when it can be
           linked to the
            > > same virtualenv that developers have already setup,
           eliminating extra
            > > packaging/deployment steps.
            > >
            > > Max, I would be interested to sync up on what your thoughts are
            > > regarding that option since you mention you also started to
           work on it
            > > (see previous discussion [1], not sure if there is a JIRA
           for it yet).
            > > Internally we are planning to use a direct SDK harness
           process instead
            > > of Docker containers. For our specific needs it will works
           equally well
            > > for development and production, including future plans to
           deploy Flink
            > > TMs via Kubernetes.
            > >
            > > Thanks,
            > > Thomas
            > >
            > > [1]
            > >
           
https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E
            > >
            > >
            > >
            > >
            > >
            > >
            > > On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels
           <[email protected] <mailto:[email protected]>
            > > <mailto:[email protected] <mailto:[email protected]>>> wrote:
            > >
            > >     Thanks for your suggestions. Please see below.
            > >
            > >      > Option 3) would be to map in the docker binary and
           socket to allow
            > >      > the containerized Flink job server to start
           "sibling" containers on
            > >      > the host.
            > >
            > >     Do you mean packaging Docker inside the Job Server
           container and
            > >     mounting /var/run/docker.sock from the host inside the
           container? That
            > >     looks like a bit of a hack but for testing it could be
           fine.
            > >
            > >      > notably, if the runner supports auto-scaling or
           similar non-trivial
            > >      > configurations, that would be difficult to manage
           from the SDK side.
            > >
            > >     You're right, it would be unfortunate if the SDK would
           have to deal with
            > >     spinning up SDK harness/backend containers. For non-trivial
            > >     configurations it would probably require an extended
           protocol.
            > >
            > >      > Option 4) We are also thinking about adding process
           based SDKHarness.
            > >      > This will avoid docker in docker scenario.
            > >
            > >     Actually, I had started implementing a process-based
           SDK harness but
            > >     figured it might be impractical because it doubles the
           execution path
            > >     for UDF code and potentially doesn't work with custom
           dependencies.
            > >
            > >      > Process based SDKHarness also has other applications
           and might be
            > >      > desirable in some of the production use cases.
            > >
            > >     True. Some users might want something more lightweight.
            > >
            >
            > --
            > Max


--
Max


--
Max


--
Max


--
Max

Re: Bootstrapping Beam's Job Server

Reply via email to