BCantos17 opened a new issue, #43517:
URL: https://github.com/apache/airflow/issues/43517

   ### Apache Airflow Provider(s)
   
   cncf-kubernetes
   
   ### Versions of Apache Airflow Providers
   
   8.1.1
   
   ### Apache Airflow version
   
   2.8.4
   
   ### Operating System
   
   Debian GNU/Linux
   
   ### Deployment
   
   Other 3rd-party Helm chart
   
   ### Deployment details
   
   k8s: 1.29
   terraform helm provider: 2.16.1
   
   
   ### What happened
   
   Attempting to submit a spark job via SparkKubernetesOperator in airflow. We 
have the [spark-operator](https://github.com/kubeflow/spark-operator) v2.0.2 
installed. 
   
   spark-application-manifest.yaml
   ```yaml
   apiVersion: "sparkoperator.k8s.io/v1beta2"
   kind: SparkApplication
   metadata:
     name: {{ application_name }}
     namespace: airflow
   spec:
     type: Python
     mode: cluster
     image: {{ image_path }}
     imagePullPolicy: IfNotPresent
     pythonVersion: "3"
     mainApplicationFile: "local:////opt/spark/work-dir/runner/v2/__main__.py"
     sparkVersion: "3.5.1"
     timeToLiveSeconds: 600
     sparkConf:
       spark.serializer: org.apache.spark.serializer.KryoSerializer
       spark.dynamicAllocation.executorIdleTimeout: 600s
       
spark.kubernetes.driver.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName:
 OnDemand
       
spark.kubernetes.driver.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass:
 io2-storage-class
       
spark.kubernetes.driver.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit:
 25Gi
       
spark.kubernetes.driver.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path:
 /opt/spark/shuffle
       
spark.kubernetes.driver.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly:
 "false"
       
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName:
 OnDemand
       
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass:
 io2-storage-class
       
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit:
 25Gi
       
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path:
 /opt/spark/shuffle
       
spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly:
 "false"
       spark.local.dir: /opt/spark/shuffle
       spark.kubernetes.driver.podTemplateFile: 
"/opt/spark/conf/pod_template.yaml"
       spark.kubernetes.executor.podTemplateFile: 
"/opt/spark/conf/pod_template.yaml"
     hadoopConf:
       fs.s3a.aws.credentials.provider: 
"com.amazonaws.auth.WebIdentityTokenCredentialsProvider" 
       fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
     restartPolicy:
       type: Never
     driver:
       cores: 8
       coreRequest: "8"
       coreLimit: "8"
       memory: "20G" 
       memoryOverheadFactor: 0.10 
       serviceAccount: spark-app
     executor:
       deleteOnTermination: false
       cores: 8
       coreRequest: "8"
       coreLimit: "8"
       instances: 29 
       memory: "20G" 
       memoryOverheadFactor: 0.10 
       serviceAccount: spark-app
   ```
   
   Note `spark.kubernetes.driver.podTemplateFile` paramter. The path 
`/opt/spark/conf/pod_template.yaml` exist in the Spark Operator pod in our 
cluster which we mounted on as a config map. I confirmed this exist in the 
spark operator pod
   
   After the dag gets started and attempts to start the spark job, I get this 
error
   
   ```
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/abstractoperator.py",
 line 699, in _do_render_template_fields
       rendered_content = self.render_template(
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py",
 line 186, in render_template
       return {k: self.render_template(v, context, jinja_env, oids) for k, v in 
value.items()}
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py",
 line 186, in <dictcomp>
       return {k: self.render_template(v, context, jinja_env, oids) for k, v in 
value.items()}
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py",
 line 186, in render_template
       return {k: self.render_template(v, context, jinja_env, oids) for k, v in 
value.items()}
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py",
 line 186, in <dictcomp>
       return {k: self.render_template(v, context, jinja_env, oids) for k, v in 
value.items()}
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py",
 line 186, in render_template
       return {k: self.render_template(v, context, jinja_env, oids) for k, v in 
value.items()}
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py",
 line 186, in <dictcomp>
       return {k: self.render_template(v, context, jinja_env, oids) for k, v in 
value.items()}
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py",
 line 171, in render_template
       template = jinja_env.get_template(value)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/jinja2/environment.py", line 
1010, in get_template
       return self._load_template(name, globals)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/jinja2/environment.py", line 
969, in _load_template
       template = self.loader.load(self, name, self.make_globals(globals))
     File 
"/home/airflow/.local/lib/python3.10/site-packages/jinja2/loaders.py", line 
125, in load
       source, filename, uptodate = self.get_source(environment, name)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/jinja2/loaders.py", line 
204, in get_source
       raise TemplateNotFound(template)
   jinja2.exceptions.TemplateNotFound: /opt/spark/conf/pod_template.yaml
   [2024-10-30, 11:20:16 EDT] {taskinstance.py:2731} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py",
 line 2360, in _run_raw_task
       self._execute_task_with_callbacks(context, test_mode, session=session)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py",
 line 2498, in _execute_task_with_callbacks
       task_orig = self.render_templates(context=context)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py",
 line 2910, in render_templates
       original_task.render_template_fields(context)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/baseoperator.py",
 line 1241, in render_template_fields
       self._do_render_template_fields(self, self.template_fields, context, 
jinja_env, set())
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/utils/session.py", 
line 79, in wrapper
       return func(*args, session=session, **kwargs)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/abstractoperator.py",
 line 699, in _do_render_template_fields
       rendered_content = self.render_template(
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py",
 line 186, in render_template
       return {k: self.render_template(v, context, jinja_env, oids) for k, v in 
value.items()}
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py",
 line 186, in <dictcomp>
       return {k: self.render_template(v, context, jinja_env, oids) for k, v in 
value.items()}
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py",
 line 186, in render_template
       return {k: self.render_template(v, context, jinja_env, oids) for k, v in 
value.items()}
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py",
 line 186, in <dictcomp>
       return {k: self.render_template(v, context, jinja_env, oids) for k, v in 
value.items()}
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py",
 line 186, in render_template
       return {k: self.render_template(v, context, jinja_env, oids) for k, v in 
value.items()}
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py",
 line 186, in <dictcomp>
       return {k: self.render_template(v, context, jinja_env, oids) for k, v in 
value.items()}
     File 
"/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py",
 line 171, in render_template
       template = jinja_env.get_template(value)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/jinja2/environment.py", line 
1010, in get_template
       return self._load_template(name, globals)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/jinja2/environment.py", line 
969, in _load_template
       template = self.loader.load(self, name, self.make_globals(globals))
     File 
"/home/airflow/.local/lib/python3.10/site-packages/jinja2/loaders.py", line 
125, in load
       source, filename, uptodate = self.get_source(environment, name)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/jinja2/loaders.py", line 
204, in get_source
       raise TemplateNotFound(template)
   jinja2.exceptions.TemplateNotFound: /opt/spark/conf/pod_template.yaml
   ```
   
   However if I remove `spark.kubernetes.driver.podTemplateFile` then the job 
gets submitted without an issue
   
   ### What you think should happen instead
   
   Airflow should pass along the SparkApplication Manifest along to the Spark 
Operator that exist in our Kubernetes cluster without trying to read 
`spark.kubernetes.driver.podTemplateFile` file path and look for it in the 
airflow pod which should just be submitting the SparkApplication manifest along 
to the Spark Operator pod
   
   ### How to reproduce
   
   Have airflow 2.8.4 installed, have SparkOperator 2.0.2 installed, create a 
similar SparkApplication manifest as described above and specify 
`spark.kubernetes.driver.podTemplateFile` and 
`spark.kubernetes.executor.podTemplateFile` and try to run
   
   ### Anything else
   
   I have also experience the same issue using the older version of the Spark 
Operator so this is isolated from the Spark-Operator version
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to