meetic-mrobin commented on issue #46334:
URL: https://github.com/apache/airflow/issues/46334#issuecomment-3183769140

   @rlebreto and I found a solution that works, at least for our usecase 
(standalone, deploy_mode cluster, Spark 3.5).
   
   Our conclusion is that `spark-submit --master XXX --status YYY` is broken 
(or we didn't understand how it's supposed to operate):
   - if you provide a master with the binary port 7077, it fails because it 
tries to query it in HTTP
   - if you provide a master with the HTTP port 6066, it fails because 
RestClient does not actually understand what HTTP Api returns
   
   So we had to make sure Airflow tracks driver status using Curl strategy. 
Which means enters this condition:
   
   ```
   if spark_host.endswith(":6066"):
   ```
   
   
https://github.com/apache/airflow/blob/main/providers/apache/spark/src/airflow/providers/apache/spark/hooks/spark_submit.py#L486
   
   So we have to set port to 6066 in our Spark connector (ie. REST Api port).
   
   In order to have job submission that works with this port, we added the 
following config flag in our DAG:
   
   ```
   spark_conf={
       'spark.master.rest.enabled':'true',
       ...
   }
   
   spark_compute_pi = SparkSubmitOperator(
       conf=spark_conf,
       ...
   )
   ```
   
   We also had to enable REST API on spark server side (it's disabled by 
default in recent versions).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to