Joseph McCartin created AIRFLOW-5744:
----------------------------------------

             Summary: Environment variables not correctly set in Spark submit 
operator
                 Key: AIRFLOW-5744
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5744
             Project: Apache Airflow
          Issue Type: Bug
          Components: contrib, operators
    Affects Versions: 1.10.5
            Reporter: Joseph McCartin


AIRFLOW-2380 added support for setting environment variables at runtime for the 
SparkSubmitOperator. This allows one to dynamically set the Hadoop 
configuration paths (such as YARN_CONF_DIR), in cases where the previous step 
was creating a Spark cluster.

Normal behaviour should ensure that the SparkSubmitHook class var `_env` is 
assigned the `_env_vars` variable from the SparkSubmitOperator, in the 
`_build_spark_submit_command` method. If running in YARN mode however, this is 
not set as it should be, and therefore `_env` is not passed to the Popen 
process. This currently only occurs when the deploy_mode is 'cluster' (yarn and 
cluster deploy modes are possible).

One can replicate this by setting a bash script which subsequently prints the 
environment variables as the spark-submit executable instead of the real one.

I have confirmed that adding the line: {{self._env = self._env_vars }}after 
line 244 in spark_submit_hook.py correctly propagates these environment 
variables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to