Hi all,
I’m a bit confused about the desire to use json for the environment_config.

It’s harder to use json on the command line, such that now we’re talking
about the value being *either* a docker image name *or* a path to a json
file (OR maybe yaml too!), which is not only less convenient than just
typing the docker ags you want, it's also IMHO a dirty/inconsistent design.

The idea of having each key of the json config map to a docker flag seems
like a maintenance quagmire with little benefit.  In that case, Beam devs
would have to maintain parity with docker options and be the arbiters of
what's "safe" and what's not, and users would have to read additional beam
documentation (which may or may not exist) to discover what keys are valid,
rather than simply doing what they know, and passing the docker args.  As
Sam points out, if security is a concern there are plenty of ways to abuse
the system already. Security should be handled at the infrastructure
deployment level, where it’s actually meaningful.

It also seems like there’s already a precedent for encoding environment
configuration as command line args. Consider the SUBPROCESS_SDK environment:

    options = PipelineOptions()
    options.view_as(PortableOptions).environment_type = \
        python_urns.SUBPROCESS_SDK
    options.view_as(PortableOptions).environment_config = \
        b'/usr/bin/python -m apache_beam.runners.worker.sdk_worker_main'

This could be encoded as json to avoid someone passing something nasty, but
luckily that was *not* the choice that was made, because I think this is a
fine design.

As a result, I think the original proposal was the most elegant and
consistent with other environment types:

--environment_type DOCKER --environment_config "-v
/Volumes/mnt/foo:/Volumes/mnt/foo --user sambvfx MY_CONTAINER_NAME"

Reply via email to