djuarezg opened a new issue, #38461:
URL: https://github.com/apache/airflow/issues/38461

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### If "Other Airflow 2 version" selected, which one?
   
   2.8.1
   
   ### What happened?
   
   We extend the SparkSubmitOperator and specify a queue to select which of the 
desired runners are going to run our tasks.
   
   
   ```
   from airflow.providers.apache.spark.operators.spark_submit import 
SparkSubmitOperator
   (...)
   class CustomSparkSubmitOperator(SparkSubmitOperator):
       """Extended spark submit operator."""
   
       template_fields = list(SparkSubmitOperator.template_fields)
       template_fields.append("_java_class")
   
   def spark(
       self,
       application,
       java_class,
       task_id,
       priority_weight=1,
       alert_cc=None,
       auto_tags=None,
       **task_kwargs,
   ):
       """Create a task that runs a spark job."""
       _LOGGER.debug(f"Creating task: {task_id}")
   
       env = get_env()
       default_bp_config_path = (
           
f"/data/business-parameters/business-parameters-master/data/all.{env}.conf"
       )
       extra_java_options = nlstrip(
           f"""
           -DApplication={application}
           (...)
           """
       )
   
       logstore_options = (
           f"{{{{ params.get('s3a_logstore_class', '{DEFAULT_LOGSTORE}') }}}}"
       )
   
       jar_path = f"{JAR_FOLDER}{{{{ params.jar_name }}}}"
   
       conf = {
           "spark.driver.memory": "{{ params.driver_memory }}",
           (...)
       }
   
       task = CustomSparkSubmitOperator(
           task_id=task_id,
           java_class=java_class,
           conn_id=f"spark_{env}_client",
           priority_weight=priority_weight,
           queue="airflow_spark",
           application=jar_path,
           verbose=True,
           spark_binary=SPARK_BINARY_PATH,
           conf=conf,
           env_vars=get_task_env(None),
           **get_task_kwargs(
               task_kwargs,
               alert_cc=alert_cc,
               dag=self.default_dag,
           ),
       )
       task.auto_tags = ["spark"]
       if auto_tags:
           task.auto_tags.extend(auto_tags)
       return task
   ```
   
   This is then used but the value of at least queue is set to the default 
value generic to all dags, not `queue="airflow_spark"`
   
   
   
   ### What you think should happen instead?
   
   queue should be kept and used to determine which node to run on
   
   ### How to reproduce
   
   Set up a queue using SparkSubmitOperator on provider 4.5.0 or before vs 
4.6.0 or after
   
   ### Operating System
   
   cat /etc/os-release  NAME="Rocky Linux" VERSION="8.8 (Green Obsidian)"
   
   ### Versions of Apache Airflow Providers
   
   ```
   /data/airflow/env/bin/pip list | grep providers
   apache-airflow-providers-amazon          8.19.0
   apache-airflow-providers-apache-spark    4.5.0
   apache-airflow-providers-celery          3.6.1
   apache-airflow-providers-common-io       1.3.0
   apache-airflow-providers-common-sql      1.11.1
   apache-airflow-providers-elasticsearch   5.3.3
   apache-airflow-providers-ftp             3.7.0
   apache-airflow-providers-http            4.10.0
   apache-airflow-providers-imap            3.5.0
   apache-airflow-providers-postgres        5.10.2
   apache-airflow-providers-redis           3.6.0
   apache-airflow-providers-slack           8.6.1
   apache-airflow-providers-smtp            1.6.1
   apache-airflow-providers-sqlite          3.7.1
   apache-airflow-providers-ssh             3.10.1
   
   ```
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to