Re: Bootstrapping Beam's Job Server

Thomas Weise Mon, 20 Aug 2018 15:36:01 -0700

The original objective was to make test/development easier (which I think
is super important for user experience with portable runner).

>From first hand experience I can confirm that dealing with Flink clusters
and Docker containers for local setup is a significant hurdle for Python
developers.

To simplify using Flink in embedded mode, the (direct) process based SDK
harness would be a good option, especially when it can be linked to the
same virtualenv that developers have already setup, eliminating extra
packaging/deployment steps.

Max, I would be interested to sync up on what your thoughts are regarding
that option since you mention you also started to work on it (see previous
discussion [1], not sure if there is a JIRA for it yet). Internally we are
planning to use a direct SDK harness process instead of Docker containers.
For our specific needs it will works equally well for development and
production, including future plans to deploy Flink TMs via Kubernetes.

Thanks,
Thomas

[1]
https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E

On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels <[email protected]> wrote:

> Thanks for your suggestions. Please see below.
>
> > Option 3) would be to map in the docker binary and socket to allow
> > the containerized Flink job server to start "sibling" containers on
> > the host.
>
> Do you mean packaging Docker inside the Job Server container and
> mounting /var/run/docker.sock from the host inside the container? That
> looks like a bit of a hack but for testing it could be fine.
>
> > notably, if the runner supports auto-scaling or similar non-trivial
> > configurations, that would be difficult to manage from the SDK side.
>
> You're right, it would be unfortunate if the SDK would have to deal with
> spinning up SDK harness/backend containers. For non-trivial
> configurations it would probably require an extended protocol.
>
> > Option 4) We are also thinking about adding process based SDKHarness.
> > This will avoid docker in docker scenario.
>
> Actually, I had started implementing a process-based SDK harness but
> figured it might be impractical because it doubles the execution path
> for UDF code and potentially doesn't work with custom dependencies.
>
> > Process based SDKHarness also has other applications and might be
> > desirable in some of the production use cases.
>
> True. Some users might want something more lightweight.
>

Re: Bootstrapping Beam's Job Server

Reply via email to