[
https://issues.apache.org/jira/browse/BEAM-8613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992035#comment-16992035
]
Chad Dombrova edited comment on BEAM-8613 at 12/17/19 12:34 AM:
----------------------------------------------------------------
{quote}What kind of environment variables are you trying to pass here?
{quote}
We're primarily interested in configuring various libraries and applications
used by our UDFs. These each have their own set of environment variables which
typically need to be configured before modules are imported.
Another use case which we intend to explore soon is passing env vars to control
the behavior of pip in {{boot}}. For example, to point it at our internal pypi
mirror. Do you think this falls into the category of "building too much into
these (unstructured) string fields"?
{quote}Is there not another way to pass this data to the operations being
performed in this container?
{quote}
Let's frame this as a user story:
"As a developer, I want to set library- and application-specific env variables
(usually third-party) in the SDK process before any affected modules are
imported, so that I can bind a particular configuration to a job."
Let's evaluate a few options:
- custom PipelineOptions: this runs too late. by the time we can read the
pipeline options, our UDF and its pcollection element types have been
unpickled, thereby importing many dependent modules.
- custom config file uploaded to artifact service: same problem as above.
- custom docker container: this is too slow. we don't want to create a new
docker container for every permutation that we might need. we want this to be
user controlled at job submission time
- custom docker ARGS: this is needlessly complicated. theoretically if we
had a custom docker container with a custom entrypoint script and the ability
to configure docker args via the DOCKER environment we could get this to work,
but that's a lot of work just to set some env vars. besides, we already have
the ability to set env vars for PROCESS environment type, so doing the same for
DOCKER seems natural.
I'm not sure what other good options there are. Environment variables seem like
the most direct and generally useful approach.
was (Author: chadrik):
{quote}What kind of environment variables are you trying to pass here?
{quote}
We're primarily interested in configuring various libraries and applications
used by our UDFs. These each have their own set of environment variables which
typically need to be configured before modules are imported.
Another use case which we intend to explore soon is passing env vars to control
the behavior of pip in {{boot}}. For example, to point it at our internal pypi
mirror. Do you think this falls into the category of "building too much into
these (unstructured) string fields"?
{quote}Is there not another way to pass this data to the operations being
performed in this container?
{quote}
Let's frame this as a user story:
"As a developer, I want to set library- and application-specific env variables
(usually third-party) in the SDK process before any affected modules are
imported, so that I can bind a particular configuration to a job."
Let's evaluate a few options:
- custom PipelineOptions: by the time we can read the pipeline options, our
UDF and its pcollection element types have been unpickled, thereby importing
many dependent modules.
- custom config file uploaded to artifact service: same problem as above.
- custom docker container: we don't want to create a new docker container for
every permutation that we might need. we want this to be user controlled at job
submission time
- custom docker ARGS: theoretically if we had a custom docker container with a
custom entrypoint script and the ability to configure docker args via the
DOCKER environment we could get this to work. this just seems needlessly
complicated. we already have the ability to set env vars for PROCESS
environment type, so doing the same for DOCKER seems natural.
I'm not sure what other good options there are. Environment variables seem like
the most direct and generally useful approach.
> Add environment variable support to Docker environment
> ------------------------------------------------------
>
> Key: BEAM-8613
> URL: https://issues.apache.org/jira/browse/BEAM-8613
> Project: Beam
> Issue Type: Improvement
> Components: java-fn-execution, runner-core, runner-direct
> Reporter: Nathan Rusch
> Assignee: Nathan Rusch
> Priority: Trivial
> Time Spent: 1h
> Remaining Estimate: 0h
>
> The Process environment allows specifying environment variables via a map
> field on its payload message. The Docker environment should support this same
> pattern, and forward the contents of the map through to the container runtime.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)