BCantos17 opened a new issue, #43517: URL: https://github.com/apache/airflow/issues/43517
### Apache Airflow Provider(s) cncf-kubernetes ### Versions of Apache Airflow Providers 8.1.1 ### Apache Airflow version 2.8.4 ### Operating System Debian GNU/Linux ### Deployment Other 3rd-party Helm chart ### Deployment details k8s: 1.29 terraform helm provider: 2.16.1 ### What happened Attempting to submit a spark job via SparkKubernetesOperator in airflow. We have the [spark-operator](https://github.com/kubeflow/spark-operator) v2.0.2 installed. spark-application-manifest.yaml ```yaml apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: {{ application_name }} namespace: airflow spec: type: Python mode: cluster image: {{ image_path }} imagePullPolicy: IfNotPresent pythonVersion: "3" mainApplicationFile: "local:////opt/spark/work-dir/runner/v2/__main__.py" sparkVersion: "3.5.1" timeToLiveSeconds: 600 sparkConf: spark.serializer: org.apache.spark.serializer.KryoSerializer spark.dynamicAllocation.executorIdleTimeout: 600s spark.kubernetes.driver.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName: OnDemand spark.kubernetes.driver.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass: io2-storage-class spark.kubernetes.driver.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit: 25Gi spark.kubernetes.driver.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path: /opt/spark/shuffle spark.kubernetes.driver.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly: "false" spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName: OnDemand spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass: io2-storage-class spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit: 25Gi spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path: /opt/spark/shuffle spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly: "false" spark.local.dir: /opt/spark/shuffle spark.kubernetes.driver.podTemplateFile: "/opt/spark/conf/pod_template.yaml" spark.kubernetes.executor.podTemplateFile: "/opt/spark/conf/pod_template.yaml" hadoopConf: fs.s3a.aws.credentials.provider: "com.amazonaws.auth.WebIdentityTokenCredentialsProvider" fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem" restartPolicy: type: Never driver: cores: 8 coreRequest: "8" coreLimit: "8" memory: "20G" memoryOverheadFactor: 0.10 serviceAccount: spark-app executor: deleteOnTermination: false cores: 8 coreRequest: "8" coreLimit: "8" instances: 29 memory: "20G" memoryOverheadFactor: 0.10 serviceAccount: spark-app ``` Note `spark.kubernetes.driver.podTemplateFile` paramter. The path `/opt/spark/conf/pod_template.yaml` exist in the Spark Operator pod in our cluster which we mounted on as a config map. I confirmed this exist in the spark operator pod After the dag gets started and attempts to start the spark job, I get this error ``` Traceback (most recent call last): File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/abstractoperator.py", line 699, in _do_render_template_fields rendered_content = self.render_template( File "/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py", line 186, in render_template return {k: self.render_template(v, context, jinja_env, oids) for k, v in value.items()} File "/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py", line 186, in <dictcomp> return {k: self.render_template(v, context, jinja_env, oids) for k, v in value.items()} File "/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py", line 186, in render_template return {k: self.render_template(v, context, jinja_env, oids) for k, v in value.items()} File "/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py", line 186, in <dictcomp> return {k: self.render_template(v, context, jinja_env, oids) for k, v in value.items()} File "/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py", line 186, in render_template return {k: self.render_template(v, context, jinja_env, oids) for k, v in value.items()} File "/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py", line 186, in <dictcomp> return {k: self.render_template(v, context, jinja_env, oids) for k, v in value.items()} File "/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py", line 171, in render_template template = jinja_env.get_template(value) File "/home/airflow/.local/lib/python3.10/site-packages/jinja2/environment.py", line 1010, in get_template return self._load_template(name, globals) File "/home/airflow/.local/lib/python3.10/site-packages/jinja2/environment.py", line 969, in _load_template template = self.loader.load(self, name, self.make_globals(globals)) File "/home/airflow/.local/lib/python3.10/site-packages/jinja2/loaders.py", line 125, in load source, filename, uptodate = self.get_source(environment, name) File "/home/airflow/.local/lib/python3.10/site-packages/jinja2/loaders.py", line 204, in get_source raise TemplateNotFound(template) jinja2.exceptions.TemplateNotFound: /opt/spark/conf/pod_template.yaml [2024-10-30, 11:20:16 EDT] {taskinstance.py:2731} ERROR - Task failed with exception Traceback (most recent call last): File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py", line 2360, in _run_raw_task self._execute_task_with_callbacks(context, test_mode, session=session) File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py", line 2498, in _execute_task_with_callbacks task_orig = self.render_templates(context=context) File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py", line 2910, in render_templates original_task.render_template_fields(context) File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/baseoperator.py", line 1241, in render_template_fields self._do_render_template_fields(self, self.template_fields, context, jinja_env, set()) File "/home/airflow/.local/lib/python3.10/site-packages/airflow/utils/session.py", line 79, in wrapper return func(*args, session=session, **kwargs) File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/abstractoperator.py", line 699, in _do_render_template_fields rendered_content = self.render_template( File "/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py", line 186, in render_template return {k: self.render_template(v, context, jinja_env, oids) for k, v in value.items()} File "/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py", line 186, in <dictcomp> return {k: self.render_template(v, context, jinja_env, oids) for k, v in value.items()} File "/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py", line 186, in render_template return {k: self.render_template(v, context, jinja_env, oids) for k, v in value.items()} File "/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py", line 186, in <dictcomp> return {k: self.render_template(v, context, jinja_env, oids) for k, v in value.items()} File "/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py", line 186, in render_template return {k: self.render_template(v, context, jinja_env, oids) for k, v in value.items()} File "/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py", line 186, in <dictcomp> return {k: self.render_template(v, context, jinja_env, oids) for k, v in value.items()} File "/home/airflow/.local/lib/python3.10/site-packages/airflow/template/templater.py", line 171, in render_template template = jinja_env.get_template(value) File "/home/airflow/.local/lib/python3.10/site-packages/jinja2/environment.py", line 1010, in get_template return self._load_template(name, globals) File "/home/airflow/.local/lib/python3.10/site-packages/jinja2/environment.py", line 969, in _load_template template = self.loader.load(self, name, self.make_globals(globals)) File "/home/airflow/.local/lib/python3.10/site-packages/jinja2/loaders.py", line 125, in load source, filename, uptodate = self.get_source(environment, name) File "/home/airflow/.local/lib/python3.10/site-packages/jinja2/loaders.py", line 204, in get_source raise TemplateNotFound(template) jinja2.exceptions.TemplateNotFound: /opt/spark/conf/pod_template.yaml ``` However if I remove `spark.kubernetes.driver.podTemplateFile` then the job gets submitted without an issue ### What you think should happen instead Airflow should pass along the SparkApplication Manifest along to the Spark Operator that exist in our Kubernetes cluster without trying to read `spark.kubernetes.driver.podTemplateFile` file path and look for it in the airflow pod which should just be submitting the SparkApplication manifest along to the Spark Operator pod ### How to reproduce Have airflow 2.8.4 installed, have SparkOperator 2.0.2 installed, create a similar SparkApplication manifest as described above and specify `spark.kubernetes.driver.podTemplateFile` and `spark.kubernetes.executor.podTemplateFile` and try to run ### Anything else I have also experience the same issue using the older version of the Spark Operator so this is isolated from the Spark-Operator version ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
