ottomata opened a new issue, #30064: URL: https://github.com/apache/airflow/issues/30064
### Apache Airflow Provider(s) apache-spark ### Versions of Apache Airflow Providers 4.0.0 ### Apache Airflow version 2.5.1 ### Operating System Debian GNU/Linux 10 (buster) ### Deployment Other ### Deployment details _No response_ ### What happened in airflow-providers-apache-spark 4.0.0, [the value of spark_binary was hardcoded to be restricted](https://github.com/apache/airflow/commit/93589288156d56aff4b1f822b77695e3c58e4568) to only either 'spark-submit' or 'spark2-submit'. What was the reason for this? At the Wikimedia Foundation, we install the spark 3 binary as 'spark3-submit'. This change in airflow spark 4.0.0 has broken some of our dags, making us resort to things like [this](https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/commit/b962f1501a82fc9bc8e612f2669c68357894e23d#note_20733). ### What you think should happen instead We'd submit a patch to expand the restriction list to include 'spark3-submit', but we aren't sure why this was done in the first place. I understand the reasoning for removing `spark_home`, but it seems strange to have a `spark_binary` parameter and restrict it to these two values. Can we undo this? If not, should we submit a patch to add spark3-submit to the list? ### How to reproduce Set `spark_binary` to 'spark3-submit' ### Anything else _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
