Re: Bootstrapping Beam's Job Server

Lukasz Cwik Tue, 21 Aug 2018 14:12:32 -0700

I believe supporting a simple Process environment makes sense. It would be
best if we didn't make the Process route solve all the problems that Docker
solves for us. In my opinion we should limit the Process route to assume
that the execution environment:
* has all dependencies and libraries installed
* is of a compatible machine architecture
* doesn't require special networking rules to be setup


Any other suggestions for reasonable limits on a Process environment?

On Tue, Aug 21, 2018 at 2:53 AM Ismaël Mejía <ieme...@gmail.com> wrote:

> It is also worth to mention that apart of the testing/development use
> case there is also the case of supporting people running in Hadoop
> distributions. There are two extra reasons to want a process based
> version: (1) Some Hadoop distributions run in machines with really old
> kernels where docker support is limited or nonexistent (yes, some of
> those run on kernel 2.6!) and (2) Ops people may be reticent to the
> additional operational overhead of enabling docker in their clusters.
> On Tue, Aug 21, 2018 at 11:50 AM Maximilian Michels <m...@apache.org>
> wrote:
> >
> > Thanks Henning and Thomas. It looks like
> >
> > a) we want to keep the Docker Job Server Docker container and rely on
> > spinning up "sibling" SDK harness containers via the Docker socket. This
> > should require little changes to the Runner code.
> >
> > b) have the InProcess SDK harness as an alternative way to running user
> > code. This can be done independently of a).
> >
> > Thomas, let's sync today on the InProcess SDK harness. I've created a
> > JIRA issue: https://issues.apache.org/jira/browse/BEAM-5187
> >
> > Cheers,
> > Max
> >
> > On 21.08.18 00:35, Thomas Weise wrote:
> > > The original objective was to make test/development easier (which I
> > > think is super important for user experience with portable runner).
> > >
> > >  From first hand experience I can confirm that dealing with Flink
> > > clusters and Docker containers for local setup is a significant hurdle
> > > for Python developers.
> > >
> > > To simplify using Flink in embedded mode, the (direct) process based
> SDK
> > > harness would be a good option, especially when it can be linked to the
> > > same virtualenv that developers have already setup, eliminating extra
> > > packaging/deployment steps.
> > >
> > > Max, I would be interested to sync up on what your thoughts are
> > > regarding that option since you mention you also started to work on it
> > > (see previous discussion [1], not sure if there is a JIRA for it yet).
> > > Internally we are planning to use a direct SDK harness process instead
> > > of Docker containers. For our specific needs it will works equally well
> > > for development and production, including future plans to deploy Flink
> > > TMs via Kubernetes.
> > >
> > > Thanks,
> > > Thomas
> > >
> > > [1]
> > >
> https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels <m...@apache.org
> > > <mailto:m...@apache.org>> wrote:
> > >
> > >     Thanks for your suggestions. Please see below.
> > >
> > >      > Option 3) would be to map in the docker binary and socket to
> allow
> > >      > the containerized Flink job server to start "sibling"
> containers on
> > >      > the host.
> > >
> > >     Do you mean packaging Docker inside the Job Server container and
> > >     mounting /var/run/docker.sock from the host inside the container?
> That
> > >     looks like a bit of a hack but for testing it could be fine.
> > >
> > >      > notably, if the runner supports auto-scaling or similar
> non-trivial
> > >      > configurations, that would be difficult to manage from the SDK
> side.
> > >
> > >     You're right, it would be unfortunate if the SDK would have to
> deal with
> > >     spinning up SDK harness/backend containers. For non-trivial
> > >     configurations it would probably require an extended protocol.
> > >
> > >      > Option 4) We are also thinking about adding process based
> SDKHarness.
> > >      > This will avoid docker in docker scenario.
> > >
> > >     Actually, I had started implementing a process-based SDK harness
> but
> > >     figured it might be impractical because it doubles the execution
> path
> > >     for UDF code and potentially doesn't work with custom dependencies.
> > >
> > >      > Process based SDKHarness also has other applications and might
> be
> > >      > desirable in some of the production use cases.
> > >
> > >     True. Some users might want something more lightweight.
> > >
> >
> > --
> > Max
>

Re: Bootstrapping Beam's Job Server

Reply via email to