[ 
https://issues.apache.org/jira/browse/AIRFLOW-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16993869#comment-16993869
 ] 

Joseph McCartin commented on AIRFLOW-5744:
------------------------------------------

The fix is somewhat simple, but it is unclear for what cases the '_env_vars' 
variable should be handed down to the Popen process.

*yarn:* [from the 
docs|https://spark.apache.org/docs/latest/running-on-yarn.html] _"Unlike other 
cluster managers supported by Spark in which the master’s address is specified 
in the --master parameter, in YARN mode the ResourceManager’s address is picked 
up from the Hadoop configuration."_  This configuration is pointed by one or 
more of the env vars.

*k8s:* the master is set in the spark-submit arguments of the form 
_k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>_, and not in the 
hadoop configuration [link to 
documentation|https://spark.apache.org/docs/latest/running-on-kubernetes.html].

To minimise disruption or having unwanted environment variables present at 
runtime, it's probably best that this is only added for the yarn case, but it 
should be trivial to add it to the k8s case in the future.

> Environment variables not correctly set in Spark submit operator
> ----------------------------------------------------------------
>
>                 Key: AIRFLOW-5744
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5744
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: contrib, operators
>    Affects Versions: 1.10.5
>            Reporter: Joseph McCartin
>            Priority: Trivial
>
> AIRFLOW-2380 added support for setting environment variables at runtime for 
> the SparkSubmitOperator. The intention was to allow for dynamic configuration 
> paths (such as HADOOP_CONF_DIR). The pull request, however, only made it so 
> that these env vars would only be set at runtime if a standalone cluster and 
> a client deploy mode was chosen. For kubernetes and yarn modes, the env vars 
> would be sent to the driver via the spark arguments _spark.yarn.appMasterEnv_ 
> (and equivalent for k8s).
> If one wishes to dynamically set the yarn master address (via a 
> _yarn-site.xml_ file), then one or more environment variables __ need to be 
> present at runtime, and this is not currently done.
> The SparkSubmitHook class var `_env` is assigned the `_env_vars` variable 
> from the SparkSubmitOperator, in the `_build_spark_submit_command` method. If 
> running in YARN mode however, this is not set as it should be, and therefore 
> `_env` is not passed to the Popen process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to