[ 
https://issues.apache.org/jira/browse/BEAM-8613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992035#comment-16992035
 ] 

Chad Dombrova commented on BEAM-8613:
-------------------------------------

{quote}What kind of environment variables are you trying to pass here?
{quote}
We're primarily interested in configuring various libraries and applications 
used by our UDFs. These each have their own set of environment variables which 
typically need to be configured before modules are imported. 

Another use case which we intend to explore soon is passing env vars to control 
the behavior of pip in {{boot}}. For example, to point it at our internal pypi 
mirror. Do you think this falls into the category of "building too much into 
these (unstructured) string fields"?
{quote}Is there not another way to pass this data to the operations being 
performed in this container?
{quote}
Let's frame this as a user story:

"As a developer, I want to set library- and application-specific env variables 
(usually third-party) in the SDK process before any affected modules are 
imported, so that I can bind a particular configuration to a job."

Let's evaluate a few options:
 - custom PipelineOptions: by the time we can read the pipeline options, our 
UDF and its pcollection element types have been unpickled, thereby importing 
many dependent modules.
 - custom config file uploaded to artifact service: same problem as above.
 - custom docker container: we don't want to create a new docker container for 
every permutation that we might need. we want this to be user controlled at job 
submission time
 - custom docker ARGS: theoretically if we had a custom docker container with a 
custom entrypoint script and the ability to configure docker args via the 
DOCKER environment we could get this to work. this just seems needlessly 
complicated.  we already have the ability to set env vars for PROCESS 
environment type, so doing the same for DOCKER seems natural. 

I'm not sure what other good options there are. Environment variables seem like 
the most direct and generally useful approach. 

 

> Add environment variable support to Docker environment
> ------------------------------------------------------
>
>                 Key: BEAM-8613
>                 URL: https://issues.apache.org/jira/browse/BEAM-8613
>             Project: Beam
>          Issue Type: Improvement
>          Components: java-fn-execution, runner-core, runner-direct
>            Reporter: Nathan Rusch
>            Assignee: Nathan Rusch
>            Priority: Trivial
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> The Process environment allows specifying environment variables via a map 
> field on its payload message. The Docker environment should support this same 
> pattern, and forward the contents of the map through to the container runtime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to