Re: Bootstrapping Beam's Job Server

Robert Bradshaw Thu, 23 Aug 2018 08:08:24 -0700

On Thu, Aug 23, 2018 at 3:47 PM Maximilian Michels <m...@apache.org> wrote:
>
>  > Going down this path may start to get fairly involved, with an almost
>  > endless list of features that could be requested. Instead, I would
>  > suggest we keep process-based execution very simple, and specify bash
>  > script (that sets up the environment and whatever else one may want to
>  > do) as the command line invocation.
>
> Fair point. At the least, we will have to transfer the shell script to
> the nodes. Anything else is up to the script.
>
>  > I would also think it'd be really valuable to provide a "callback"
>  > environment, where an RPC call is made to trigger worker creation
>  > (deletion?), passing the requisite parameters (e.g. the fn api
>  > endpoints).
>
> Aren't you making up more features now? :) Couldn't this be also handled
> by the shell script?


Good point :). I still think it'd be nice to make this option more
explicit, as it doesn't even require starting up (or managing) a
subprocess.

> On 23.08.18 14:13, Robert Bradshaw wrote:
> > On Thu, Aug 23, 2018 at 1:54 PM Maximilian Michels <m...@apache.org> wrote:
> >>
> >> Big +1. Process-based execution should be simple to reason about for
> >> users.
> >
> > +1. In fact, this is exactly what the Python local job server does,
> > with running Docker simply being a particular command line that's
> > passed down here.
> >
> > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/local_job_service_main.py
> >
> >> The implementation should not be too involved. The user has to
> >> ensure the environment is suitable for process-based execution.
> >>
> >> There are some minor features that we should support:
> >>
> >> - Activating a virtual environment for Python / Adding pre-installed
> >> libraries to the classpath
> >>
> >> - Staging libraries, similarly to the boot code for Docker
> >
> > Going down this path may start to get fairly involved, with an almost
> > endless list of features that could be requested. Instead, I would
> > suggest we keep process-based execution very simple, and specify bash
> > script (that sets up the environment and whatever else one may want to
> > do) as the command line invocation. We could even provide a couple of
> > these. (The arguments to pass should be configurable).
> >
> > I would also think it'd be really valuable to provide a "callback"
> > environment, where an RPC call is made to trigger worker creation
> > (deletion?), passing the requisite parameters (e.g. the fn api
> > endpoints). This could be useful both in a distributed system (where
> > it may be desirable for an external entity to actually start up the
> > workers) or for debugging/testing (where one could call into the same
> > process that submitted the job, which would execute workers on
> > separate threads with an already set up environment).
> >
> >> On 22.08.18 07:49, Henning Rohde wrote:
> >>> Agree with Luke. Perhaps something simple, prescriptive yet flexible,
> >>> such as custom command line (defined in the environment proto) rooted at
> >>> the base of the provided artifacts and either passed the same arguments
> >>> or defined in the container contract or made available through
> >>> substitution. That way, all the restrictions/assumptions of the
> >>> execution environment become implicit and runner/deployment dependent.
> >>>
> >>>
> >>> On Tue, Aug 21, 2018 at 2:12 PM Lukasz Cwik <lc...@google.com
> >>> <mailto:lc...@google.com>> wrote:
> >>>
> >>>      I believe supporting a simple Process environment makes sense. It
> >>>      would be best if we didn't make the Process route solve all the
> >>>      problems that Docker solves for us. In my opinion we should limit
> >>>      the Process route to assume that the execution environment:
> >>>      * has all dependencies and libraries installed
> >>>      * is of a compatible machine architecture
> >>>      * doesn't require special networking rules to be setup
> >>>
> >>>      Any other suggestions for reasonable limits on a Process environment?
> >>>
> >>>      On Tue, Aug 21, 2018 at 2:53 AM Ismaël Mejía <ieme...@gmail.com
> >>>      <mailto:ieme...@gmail.com>> wrote:
> >>>
> >>>          It is also worth to mention that apart of the
> >>>          testing/development use
> >>>          case there is also the case of supporting people running in 
> >>> Hadoop
> >>>          distributions. There are two extra reasons to want a process 
> >>> based
> >>>          version: (1) Some Hadoop distributions run in machines with
> >>>          really old
> >>>          kernels where docker support is limited or nonexistent (yes, 
> >>> some of
> >>>          those run on kernel 2.6!) and (2) Ops people may be reticent to 
> >>> the
> >>>          additional operational overhead of enabling docker in their
> >>>          clusters.
> >>>          On Tue, Aug 21, 2018 at 11:50 AM Maximilian Michels
> >>>          <m...@apache.org <mailto:m...@apache.org>> wrote:
> >>>           >
> >>>           > Thanks Henning and Thomas. It looks like
> >>>           >
> >>>           > a) we want to keep the Docker Job Server Docker container and
> >>>          rely on
> >>>           > spinning up "sibling" SDK harness containers via the Docker
> >>>          socket. This
> >>>           > should require little changes to the Runner code.
> >>>           >
> >>>           > b) have the InProcess SDK harness as an alternative way to
> >>>          running user
> >>>           > code. This can be done independently of a).
> >>>           >
> >>>           > Thomas, let's sync today on the InProcess SDK harness. I've
> >>>          created a
> >>>           > JIRA issue: https://issues.apache.org/jira/browse/BEAM-5187
> >>>           >
> >>>           > Cheers,
> >>>           > Max
> >>>           >
> >>>           > On 21.08.18 00:35, Thomas Weise wrote:
> >>>           > > The original objective was to make test/development easier
> >>>          (which I
> >>>           > > think is super important for user experience with portable
> >>>          runner).
> >>>           > >
> >>>           > >  From first hand experience I can confirm that dealing with
> >>>          Flink
> >>>           > > clusters and Docker containers for local setup is a
> >>>          significant hurdle
> >>>           > > for Python developers.
> >>>           > >
> >>>           > > To simplify using Flink in embedded mode, the (direct)
> >>>          process based SDK
> >>>           > > harness would be a good option, especially when it can be
> >>>          linked to the
> >>>           > > same virtualenv that developers have already setup,
> >>>          eliminating extra
> >>>           > > packaging/deployment steps.
> >>>           > >
> >>>           > > Max, I would be interested to sync up on what your thoughts 
> >>> are
> >>>           > > regarding that option since you mention you also started to
> >>>          work on it
> >>>           > > (see previous discussion [1], not sure if there is a JIRA
> >>>          for it yet).
> >>>           > > Internally we are planning to use a direct SDK harness
> >>>          process instead
> >>>           > > of Docker containers. For our specific needs it will works
> >>>          equally well
> >>>           > > for development and production, including future plans to
> >>>          deploy Flink
> >>>           > > TMs via Kubernetes.
> >>>           > >
> >>>           > > Thanks,
> >>>           > > Thomas
> >>>           > >
> >>>           > > [1]
> >>>           > >
> >>>          
> >>> https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E
> >>>           > >
> >>>           > >
> >>>           > >
> >>>           > >
> >>>           > >
> >>>           > >
> >>>           > > On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels
> >>>          <m...@apache.org <mailto:m...@apache.org>
> >>>           > > <mailto:m...@apache.org <mailto:m...@apache.org>>> wrote:
> >>>           > >
> >>>           > >     Thanks for your suggestions. Please see below.
> >>>           > >
> >>>           > >      > Option 3) would be to map in the docker binary and
> >>>          socket to allow
> >>>           > >      > the containerized Flink job server to start
> >>>          "sibling" containers on
> >>>           > >      > the host.
> >>>           > >
> >>>           > >     Do you mean packaging Docker inside the Job Server
> >>>          container and
> >>>           > >     mounting /var/run/docker.sock from the host inside the
> >>>          container? That
> >>>           > >     looks like a bit of a hack but for testing it could be
> >>>          fine.
> >>>           > >
> >>>           > >      > notably, if the runner supports auto-scaling or
> >>>          similar non-trivial
> >>>           > >      > configurations, that would be difficult to manage
> >>>          from the SDK side.
> >>>           > >
> >>>           > >     You're right, it would be unfortunate if the SDK would
> >>>          have to deal with
> >>>           > >     spinning up SDK harness/backend containers. For 
> >>> non-trivial
> >>>           > >     configurations it would probably require an extended
> >>>          protocol.
> >>>           > >
> >>>           > >      > Option 4) We are also thinking about adding process
> >>>          based SDKHarness.
> >>>           > >      > This will avoid docker in docker scenario.
> >>>           > >
> >>>           > >     Actually, I had started implementing a process-based
> >>>          SDK harness but
> >>>           > >     figured it might be impractical because it doubles the
> >>>          execution path
> >>>           > >     for UDF code and potentially doesn't work with custom
> >>>          dependencies.
> >>>           > >
> >>>           > >      > Process based SDKHarness also has other applications
> >>>          and might be
> >>>           > >      > desirable in some of the production use cases.
> >>>           > >
> >>>           > >     True. Some users might want something more lightweight.
> >>>           > >
> >>>           >
> >>>           > --
> >>>           > Max
> >>>
> >>
> >> --
> >> Max
>
> --
> Max

Re: Bootstrapping Beam's Job Server

Reply via email to