cadet354 opened a new issue #15280:
URL: https://github.com/apache/airflow/issues/15280
**Apache Airflow version**: v2.0.1
**Environment**:
- **Cloud provider or hardware configuration**:
- bare metal
- **OS** (e.g. from /etc/os-release):
- Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-65-generic x86_64)
- **Kernel** (e.g. `uname -a`):
- Linux 5.4.0-65-generic #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021 x86_64
x86_64 x86_64 GNU/Linux
- **Install tools**:
- miniconda
- **Others**:
- python 3.7
- hadoop 2.9.2
- spark 2.4.7
**What happened**:
I have two problems:
1. When starts DAG with spark-submit on yarn with deploy="cluster" airflow
doesn't track state driver.
Therefore when yarn fails, the DAG's state remains in "running mode.
2. When I manual stop DAG's job, for example, mark it as "failed", in Yarn
cluster the same job is still running.
This error occurs because empty environment is passed to subprocess.Popen
and hadoop bin directory doesn't exist in PATH.
ERROR - [Errno 2] No such file or directory: 'yarn': 'yarn'
**What you expected to happen**:
In the first case, move the task to the "failed" state.
In the second case, stopping the task on yarn cluster.
<!-- What do you think went wrong? -->
**How to reproduce it**:
Reproduce the first issue: Start spark_submit on yarn cluster in
deploy="cluster" and master="yarn", then kill task on yarn UI. On airflow task
state remains "running".
Reproduce the second issue: Start spark_submit on yarn cluster in
deploy="cluster" and master="yarn", then manually change this running job's
state to "failed", on Hadoop cluster the same job remains in running state.
**Anything else we need to know**:
I propose the following changes to the to the
airflow\providers\apache\hooks\SparkSubmitHook.py:
1. line 201:
```python
return 'spark://' in self._connection['master'] and
self._connection['deploy_mode'] == 'cluster'
```
change to
```python
return ('spark://' in self._connection['master'] or
self._connection['master'] == "yarn") and \
(self._connection['deploy_mode'] == 'cluster')
```
2. line 659
```python
env = None
```
change to
``` python
env = {**os.environ.copy(), **(self._env if self._env else {})}
```
Applying this patch solved the above issues.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]