Coqueiro opened a new issue #13076:
URL: https://github.com/apache/airflow/issues/13076


   **Apache Airflow version**: 1.10.12 with Python3.7
   **Kubernetes version**:
   ```
   Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", 
GitCommit:"e6503f8d8f769ace2f338794c914a96fc335df0f", GitTreeState:"clean", 
BuildDate:"2020-06-27T00:38:11Z", GoVersion:"go1.14.4", Compiler:"gc", 
Platform:"darwin/amd64"}
   Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", 
GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", 
BuildDate:"2020-08-13T16:04:18Z", GoVersion:"go1.13.15", Compiler:"gc", 
Platform:"linux/amd64"}
   ```
   
   I'm running an Airflow cluster using CeleryExecutor inside a Kubernetes 
cluster, after installing `cncf.kubernetes` backport package. I already did 
some testing with the Spark Operator inside the cluster and I'm able to run 
`SparkApplications` smoothly by applying them with `kubectl`. I'm trying now to 
make an Airflow DAG execute them. When doing so, I ran into this message:
   
   ``` 
   [2020-12-14 22:18:10,609] {taskinstance.py:1150} ERROR - Invalid kube-config 
file. No configuration found.
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py",
 line 984, in _run_raw_task
       result = task_copy.execute(context=context)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/operators/spark_kubernetes.py",
 line 67, in execute
       namespace=self.namespace,
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/hooks/kubernetes.py",
 line 127, in create_custom_object
       api = client.CustomObjectsApi(self.api_client)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/cached_property.py", line 35, 
in __get__
       value = obj.__dict__[self.func.__name__] = self.func(obj)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/hooks/kubernetes.py",
 line 108, in api_client
       return self.get_conn()
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/cncf/kubernetes/hooks/kubernetes.py",
 line 102, in get_conn
       config.load_kube_config(client_configuration=self.client_configuration)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/kubernetes/config/kube_config.py",
 line 739, in load_kube_config
       persist_config=persist_config)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/kubernetes/config/kube_config.py",
 line 701, in _get_kube_config_loader_for_yaml_file
       'Invalid kube-config file. '
   kubernetes.config.config_exception.ConfigException: Invalid kube-config 
file. No configuration found. 
   ```
   
   Here's my DAG code:
   ```
   import os
   
   from datetime import datetime, timedelta
   
   from airflow import DAG
   from airflow.providers.cncf.kubernetes.operators.spark_kubernetes import 
SparkKubernetesOperator
   from airflow.operators.dummy_operator import DummyOperator
   
   DAG_ID = "spark_operator_test"
   DESCRIPTION = ""
   SCHEDULE_INTERVAL = "0 10 * * *"
   
   default_args = {
       "owner": "airflow",
       "depends_on_past": False,
       "start_date": datetime(2020, 11, 18),
       "retries": 1,
       "retry_delay": timedelta(minutes=5),
       "provide_context": True
   }
   
   with DAG(
           dag_id=DAG_ID,
           schedule_interval=SCHEDULE_INTERVAL,
           description=DESCRIPTION,
           default_args=default_args,
           catchup=False,
           concurrency=1
   ) as dag:
       end_dag = DummyOperator(task_id='end_dag')
   
       t1 = SparkKubernetesOperator(
           task_id='spark_operator_execute',
           namespace="analytics-airflow",
           application_file="resources/spark/spark-operator-test.yaml",
           kubernetes_conn_id="kubernetes_default",
           do_xcom_push=True
       )
   
       t1 >> end_dag
   ```
   
   I tried creating a connection with the following extra:
   ```
   {"extra__kubernetes__in_cluster":True}
   ```
   
   But it didn't work either. How can I correctly configure my airflow 
deployment so that I can make a task that can use the 
`SparkKubernetesOperator`, creating a `SparkApplication` within the same 
cluster?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to