[GitHub] [airflow] kaxil commented on a change in pull request #7406: [AIRFLOW-XXXX] Add architecture section to k8sexec docs

GitBox Fri, 21 Feb 2020 10:45:15 -0800

kaxil commented on a change in pull request #7406: [AIRFLOW-XXXX] Add 
architecture section to k8sexec docs
URL: https://github.com/apache/airflow/pull/7406#discussion_r382744565


 ##########
 File path: docs/executor/kubernetes.rst
 ##########
 @@ -34,3 +34,71 @@ The volumes are optional and depend on your configuration. 
There are two volumes
   - By storing logs onto a persistent disk, the files are accessible by 
workers and the webserver. If you don't configure this, the logs will be lost 
after the worker pods shuts down
 
   - Another option is to use S3/GCS/etc to store logs
+
+KubernetesExecutor Architecture
+################################
+
+The KubernetesExecutor runs as a process in the Scheduler that only requires 
access to the Kubernetes API (it does *not* need to run inside of a Kubernetes 
cluster). The KubernetesExecutor requires a non-sqlite database in the backend, 
but there are no external brokers or persistent workers needed.
+For these reasons, we recommend the KubernetesExecutor for deployments have 
long periods of dormancy between DAG execution.
+
+
+.. image:: ../img/k8s-0-worker.jpeg
+
+
+When a DAG submits a task, the KubernetesExecutor requests a worker pod from 
the Kubernetes API. The worker pod then runs the task, reports the result, and 
terminates.
+
+
+
+.. image:: ../img/k8s-3-worker.jpeg
+
+.. @startuml
+.. Airflow_Scheduler -> Kubernetes: Request a new pod with command "airflow 
run..."
+.. Kubernetes -> Airflow_Worker: Create Airflow worker with command "airflow 
run..."
+.. Airflow_Worker -> Airflow_DB: Report task passing or failure to DB
+.. Airflow_Worker -> Kubernetes: Pod completes with state "Succeeded" and k8s 
records in ETCD
+.. Kubernetes -> Airflow_Scheduler: Airflow scheduler reads "Succeeded" from 
k8s watcher thread
+.. @enduml
+.. image:: ../img/k8s-happy-path.png
+
+
+***************
+Fault Tolerance
+***************
+
+===========================
+Handling Worker Pod Crashes
+===========================
+
+When dealing with distributed systems, we need a system that assumes that any 
component can crash at any moment for reasons ranging from OOM errors to node 
upgrades.
+
+In the case where a worker dies before it can report its status to the backend 
DB, the executor can use a Kubernetes watcher thread to discover the failed pod.
+
+.. image:: ../img/k8s-watcher-2.jpeg
 
 Review comment:
   We don't have this image !

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [airflow] kaxil commented on a change in pull request #7406: [AIRFLOW-XXXX] Add architecture section to k8sexec docs

Reply via email to