EugeneMSOff opened a new issue, #41324: URL: https://github.com/apache/airflow/issues/41324
### Apache Airflow Provider(s) apache-spark ### Versions of Apache Airflow Providers 4.9.0 ### Apache Airflow version 2.9.3 ### Operating System Debian GNU/Linux 12 (bookworm) ### Deployment Docker-Compose ### Deployment details _No response_ ### What happened I try to use spark submit with --master yarn --deploy-mode cluster parameters. And I wanna to kill application on cluster when I terminate SparkSubmitOperator. SparkSubmitHook has a on_kill() method, which run `yarn application -kill` command: https://github.com/apache/airflow/blob/45658a8963761ce8a565b481156c847e493fce67/airflow/providers/apache/spark/hooks/spark_submit.py#L709 But it's not worked, because of no such binary in the PATH. My Airflow instance run on host, without hadoop installation. In the hook docstring only `spark-submit` mentioned as required, no word about yarn https://github.com/apache/airflow/blob/45658a8963761ce8a565b481156c847e493fce67/airflow/providers/apache/spark/hooks/spark_submit.py#L42 ### What you think should happen instead As an option, change the state of application with rest api https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html ### How to reproduce Use Spark connection type with deploy mode `cluster` and empy host/port fields. Run SparkSubmitOperator and mark state as "failed". ### Anything else _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
