Thanks for the feedback. I added a response in-line. On Wed, Jul 17, 2019 at 4:48 AM Ankur Goenka [email protected] <http://mailto:[email protected]> wrote:
Thanks for summarizing the discussion. > > A few comments inline below: > > > On Mon, Jul 15, 2019 at 5:28 PM Sam Bourne <[email protected]> wrote: > >> Hello Beam devs, >> >> I’ve opened a PR (https://github.com/apache/beam/pull/8982) to support >> passing options/flags to the docker run command executed as part of the >> portable environment workflow. I’m in need of providing specific volumes >> and possibly other docker run options as I refine our custom container and >> workflow. >> >> There were requests to bring this up in the mailing list to discuss >> possible ways to achieve this. There’s an existing PR #8828 >> <https://github.com/apache/beam/pull/8828> but we took quite different >> approaches. #8828 is limited to only mounting /tmp/ directories with no >> support for other docker run options/flags so wouldn’t solve my needs. >> >> I chose to expand upon the existing flag environment_config and provide >> the additional docker run options there. This requires the SDK parse these >> out when building the DockerPayload protobuf. It’s worth noting that >> what is provided to environment_config changes depending on the >> environment_type. e.g. if environment_type is docker, environment_config >> is currently expected to be the docker container name, but other >> environment types have completely different expectations, and each uses its >> own protobuf message type. >> >> The current method (using python SDK) looks like this: >> >> python -m mymodule —runner PortableRunner —job_endpoint localhost:8099 >> —environment_type DOCKER —environment_config MY_CONTAINER_NAME >> >> My PR expects other run options to be provided before the container name >> - similar to how you would start the container locally: >> >> python -m mymodule —runner PortableRunner —job_endpoint localhost:8099 >> —environment_type DOCKER —environment_config “-v >> /Volumes/mnt/foo:/Volumes/mnt/foo -v /Volumes/mnt/bar:/Volumes/mnt/bar —user >> sambvfx MY_CONTAINER_NAME” >> >> The PR’s feedback raises some questions that some of you may have >> opinions about. A hopefully faithful summary of them and my commentary >> below: >> >> Should we require the environment_config be a json encoded string that >> mirrors the protobuf? >> >> e.g. >> >> --environment_config '{"image_name": "MY_CONTAINER_NAME", "run_options": “-v >> /Volumes/mnt/foo:/Volumes/mnt/foo -v /Volumes/mnt/bar:/Volumes/mnt/bar —user >> sambvfx"}' >> >> I’m not a fan due to it not being backwards compatible and difficult to >> provide to CLI. Users don’t want to type json into the shell. >> > I agree, typing JSON on command line is really messy. But I think having > meaningful parts in the config will be easier to maintain and compare. > Can we give a config file which can be read, parsed and delivered as > options to the docker environment. > Something like "--environment_config '~/my_docker_config.json/yaml'" > That does provide a better CLI experience than writing json. If we went that route we would also want the SDKs to handle either direct json or a file path. I still prefer my original proposal which maintained backwards compatibility and within the CLI mirrors how the user normally calls docker run. I think passing a user provided command to start docker might have security > issues as users might load mount an otherwise non accessible drive or > access prohibited port etc. > I’d argue it’s up to the runner to ensure the desired privileges are setup for the worker. Doesn’t running java transform directly via an embedded environment gives you everything the worker has access to as well? Should we not assume docker run ... is the only way to start the container? >> >> I think any other method would likely require further changes to the >> protobuf or a completely new one. >> > Yes I think that makes sense. However, if we add more parameters to the > docker startup then the dockerpayload protobuf can be updated to have > those. > I imagine that any changes to the protobuf would only be to support other methods of starting a docker container - simply because run_options will contain everything the user wants to pass to docker run. Should we provide different args for mounting volume(s) and map that to the >> appropriate docker command within the beam code? >> >> This requires a lot of docker specific code to be included within beam. >> >> Any input would be appreciated. >> >> Cheers, >> Sam >> >
