Hello Beam devs,

I’ve opened a PR (https://github.com/apache/beam/pull/8982) to support
passing options/flags to the docker run command executed as part of the
portable environment workflow. I’m in need of providing specific volumes
and possibly other docker run options as I refine our custom container and
workflow.

There were requests to bring this up in the mailing list to discuss
possible ways to achieve this. There’s an existing PR #8828
<https://github.com/apache/beam/pull/8828> but we took quite different
approaches. #8828 is limited to only mounting /tmp/ directories with no
support for other docker run options/flags so wouldn’t solve my needs.

I chose to expand upon the existing flag environment_config and provide the
additional docker run options there. This requires the SDK parse these out
when building the DockerPayload protobuf. It’s worth noting that what is
provided to environment_config changes depending on the environment_type.
e.g. if environment_type is docker, environment_config is currently
expected to be the docker container name, but other environment types have
completely different expectations, and each uses its own protobuf message
type.

The current method (using python SDK) looks like this:

python -m mymodule —runner PortableRunner —job_endpoint localhost:8099
—environment_type DOCKER —environment_config MY_CONTAINER_NAME

My PR expects other run options to be provided before the container name -
similar to how you would start the container locally:

python -m mymodule —runner PortableRunner —job_endpoint localhost:8099
—environment_type DOCKER —environment_config “-v
/Volumes/mnt/foo:/Volumes/mnt/foo -v /Volumes/mnt/bar:/Volumes/mnt/bar
—user sambvfx MY_CONTAINER_NAME”

The PR’s feedback raises some questions that some of you may have opinions
about. A hopefully faithful summary of them and my commentary below:

Should we require the environment_config be a json encoded string that
mirrors the protobuf?

e.g.

--environment_config '{"image_name": "MY_CONTAINER_NAME",
"run_options": “-v /Volumes/mnt/foo:/Volumes/mnt/foo -v
/Volumes/mnt/bar:/Volumes/mnt/bar —user sambvfx"}'

I’m not a fan due to it not being backwards compatible and difficult to
provide to CLI. Users don’t want to type json into the shell.

Should we not assume docker run ... is the only way to start the container?

I think any other method would likely require further changes to the
protobuf or a completely new one.

Should we provide different args for mounting volume(s) and map that to the
appropriate docker command within the beam code?

This requires a lot of docker specific code to be included within beam.

Any input would be appreciated.

Cheers,
Sam

Reply via email to