ottomata opened a new issue, #30064:
URL: https://github.com/apache/airflow/issues/30064

   ### Apache Airflow Provider(s)
   
   apache-spark
   
   ### Versions of Apache Airflow Providers
   
   4.0.0
   
   ### Apache Airflow version
   
   2.5.1
   
   ### Operating System
   
   Debian GNU/Linux 10 (buster)
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   in airflow-providers-apache-spark 4.0.0, [the value of spark_binary
   was hardcoded to be 
restricted](https://github.com/apache/airflow/commit/93589288156d56aff4b1f822b77695e3c58e4568)
 to only either 'spark-submit' or 'spark2-submit'.
   
   What was the reason for this?  At the Wikimedia Foundation, we install the
   spark 3 binary as 'spark3-submit'.  This change in airflow spark 4.0.0 has 
broken
   some of our dags, making us resort to things like 
[this](https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/commit/b962f1501a82fc9bc8e612f2669c68357894e23d#note_20733).
   
   
   ### What you think should happen instead
   
   We'd submit a patch to expand the restriction list to include 
'spark3-submit', but we aren't sure why this was done in the first place.  I 
understand the reasoning for removing `spark_home`, but it seems strange to 
have a `spark_binary` parameter and restrict it to these two values.  
   
   Can we undo this? If not, should we submit a patch to add spark3-submit to 
the list?
   
   ### How to reproduce
   
   Set `spark_binary` to 'spark3-submit'
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to