Joseph McCartin created AIRFLOW-5744:
----------------------------------------
Summary: Environment variables not correctly set in Spark submit
operator
Key: AIRFLOW-5744
URL: https://issues.apache.org/jira/browse/AIRFLOW-5744
Project: Apache Airflow
Issue Type: Bug
Components: contrib, operators
Affects Versions: 1.10.5
Reporter: Joseph McCartin
AIRFLOW-2380 added support for setting environment variables at runtime for the
SparkSubmitOperator. This allows one to dynamically set the Hadoop
configuration paths (such as YARN_CONF_DIR), in cases where the previous step
was creating a Spark cluster.
Normal behaviour should ensure that the SparkSubmitHook class var `_env` is
assigned the `_env_vars` variable from the SparkSubmitOperator, in the
`_build_spark_submit_command` method. If running in YARN mode however, this is
not set as it should be, and therefore `_env` is not passed to the Popen
process. This currently only occurs when the deploy_mode is 'cluster' (yarn and
cluster deploy modes are possible).
One can replicate this by setting a bash script which subsequently prints the
environment variables as the spark-submit executable instead of the real one.
I have confirmed that adding the line: {{self._env = self._env_vars }}after
line 244 in spark_submit_hook.py correctly propagates these environment
variables.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)