[ 
https://issues.apache.org/jira/browse/BEAM-8613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992035#comment-16992035
 ] 

Chad Dombrova edited comment on BEAM-8613 at 12/17/19 12:34 AM:
----------------------------------------------------------------

{quote}What kind of environment variables are you trying to pass here?
{quote}
We're primarily interested in configuring various libraries and applications 
used by our UDFs. These each have their own set of environment variables which 
typically need to be configured before modules are imported. 

Another use case which we intend to explore soon is passing env vars to control 
the behavior of pip in {{boot}}. For example, to point it at our internal pypi 
mirror. Do you think this falls into the category of "building too much into 
these (unstructured) string fields"?
{quote}Is there not another way to pass this data to the operations being 
performed in this container?
{quote}
Let's frame this as a user story:

"As a developer, I want to set library- and application-specific env variables 
(usually third-party) in the SDK process before any affected modules are 
imported, so that I can bind a particular configuration to a job."

Let's evaluate a few options:
 - custom PipelineOptions:  this runs too late.  by the time we can read the 
pipeline options, our UDF and its pcollection element types have been 
unpickled, thereby importing many dependent modules. 
 - custom config file uploaded to artifact service: same problem as above.
 - custom docker container:  this is too slow.  we don't want to create a new 
docker container for every permutation that we might need. we want this to be 
user controlled at job submission time
 - custom docker ARGS:  this is needlessly complicated.  theoretically if we 
had a custom docker container with a custom entrypoint script and the ability 
to configure docker args via the DOCKER environment we could get this to work, 
but that's a lot of work just to set some env vars.  besides, we already have 
the ability to set env vars for PROCESS environment type, so doing the same for 
DOCKER seems natural. 

I'm not sure what other good options there are. Environment variables seem like 
the most direct and generally useful approach. 

 


was (Author: chadrik):
{quote}What kind of environment variables are you trying to pass here?
{quote}
We're primarily interested in configuring various libraries and applications 
used by our UDFs. These each have their own set of environment variables which 
typically need to be configured before modules are imported. 

Another use case which we intend to explore soon is passing env vars to control 
the behavior of pip in {{boot}}. For example, to point it at our internal pypi 
mirror. Do you think this falls into the category of "building too much into 
these (unstructured) string fields"?
{quote}Is there not another way to pass this data to the operations being 
performed in this container?
{quote}
Let's frame this as a user story:

"As a developer, I want to set library- and application-specific env variables 
(usually third-party) in the SDK process before any affected modules are 
imported, so that I can bind a particular configuration to a job."

Let's evaluate a few options:
 - custom PipelineOptions: by the time we can read the pipeline options, our 
UDF and its pcollection element types have been unpickled, thereby importing 
many dependent modules.
 - custom config file uploaded to artifact service: same problem as above.
 - custom docker container: we don't want to create a new docker container for 
every permutation that we might need. we want this to be user controlled at job 
submission time
 - custom docker ARGS: theoretically if we had a custom docker container with a 
custom entrypoint script and the ability to configure docker args via the 
DOCKER environment we could get this to work. this just seems needlessly 
complicated.  we already have the ability to set env vars for PROCESS 
environment type, so doing the same for DOCKER seems natural. 

I'm not sure what other good options there are. Environment variables seem like 
the most direct and generally useful approach. 

 

> Add environment variable support to Docker environment
> ------------------------------------------------------
>
>                 Key: BEAM-8613
>                 URL: https://issues.apache.org/jira/browse/BEAM-8613
>             Project: Beam
>          Issue Type: Improvement
>          Components: java-fn-execution, runner-core, runner-direct
>            Reporter: Nathan Rusch
>            Assignee: Nathan Rusch
>            Priority: Trivial
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> The Process environment allows specifying environment variables via a map 
> field on its payload message. The Docker environment should support this same 
> pattern, and forward the contents of the map through to the container runtime.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to