Hi,

The current plan for running the SDK harness is to execute docker to launch
SDK containers with service endpoints provided by the runner in the docker
command line.

In the case of Flink runner (prototype), the service endpoints are
dynamically allocated per executable stage. There is typically one Flink
task manager running per machine. Each TM has multiple task slots. A subset
of these task slots will run the Beam executable stages. Flink allows
multiple jobs in one TM, so we could have executable stages of different
pipelines running in a single TM, depending on how users deploy. The
prototype also has no cleanup for the SDK containers, they remain running
and orphaned once the runner is gone.

I'm trying to find out how this approach can be augmented for deployment on
Kubernetes. Our deployments won't allow multiple jobs per task manager, so
all task slots will belong to the same pipeline context. The intent is to
deploy SDK harness containers along with TMs in the same pod. No assumption
can be made about the order in which the containers are started, and the
SDK container wouldn't know the connect address at startup (it can only be
discovered after the pipeline gets deployed into the TMs).

I talked about that a while ago with Henning and one idea was to set a
fixed endpoint address so that the boot code in the SDK container knows
upfront where to connect to, even when that endpoint isn't available yet.
This approach may work with minimal changes to runner and little or no
change to SDK container (as long as the SDK is prepared to retry). The
downside is that all (parallel) task slots of the TM will use the same SDK
worker, which will likely lead to performance issues, at least with the
Python SDK that we are planning to use.

An alternative may be to define an SDK worker pool per pod, with a
discovery mechanism for workers to find the runner endpoints and a
coordination mechanism that distributes the dynamically allocated endpoints
that are provided by the executable stage task slots over the available
workers.

Any thoughts on this? Is anyone else looking at a docker free deployment?

Thanks,
Thomas

Reply via email to