Hello Beam devs, I’ve opened a PR (https://github.com/apache/beam/pull/8982) to support passing options/flags to the docker run command executed as part of the portable environment workflow. I’m in need of providing specific volumes and possibly other docker run options as I refine our custom container and workflow.
There were requests to bring this up in the mailing list to discuss possible ways to achieve this. There’s an existing PR #8828 <https://github.com/apache/beam/pull/8828> but we took quite different approaches. #8828 is limited to only mounting /tmp/ directories with no support for other docker run options/flags so wouldn’t solve my needs. I chose to expand upon the existing flag environment_config and provide the additional docker run options there. This requires the SDK parse these out when building the DockerPayload protobuf. It’s worth noting that what is provided to environment_config changes depending on the environment_type. e.g. if environment_type is docker, environment_config is currently expected to be the docker container name, but other environment types have completely different expectations, and each uses its own protobuf message type. The current method (using python SDK) looks like this: python -m mymodule —runner PortableRunner —job_endpoint localhost:8099 —environment_type DOCKER —environment_config MY_CONTAINER_NAME My PR expects other run options to be provided before the container name - similar to how you would start the container locally: python -m mymodule —runner PortableRunner —job_endpoint localhost:8099 —environment_type DOCKER —environment_config “-v /Volumes/mnt/foo:/Volumes/mnt/foo -v /Volumes/mnt/bar:/Volumes/mnt/bar —user sambvfx MY_CONTAINER_NAME” The PR’s feedback raises some questions that some of you may have opinions about. A hopefully faithful summary of them and my commentary below: Should we require the environment_config be a json encoded string that mirrors the protobuf? e.g. --environment_config '{"image_name": "MY_CONTAINER_NAME", "run_options": “-v /Volumes/mnt/foo:/Volumes/mnt/foo -v /Volumes/mnt/bar:/Volumes/mnt/bar —user sambvfx"}' I’m not a fan due to it not being backwards compatible and difficult to provide to CLI. Users don’t want to type json into the shell. Should we not assume docker run ... is the only way to start the container? I think any other method would likely require further changes to the protobuf or a completely new one. Should we provide different args for mounting volume(s) and map that to the appropriate docker command within the beam code? This requires a lot of docker specific code to be included within beam. Any input would be appreciated. Cheers, Sam