mik-laj opened a new issue #9981: URL: https://github.com/apache/airflow/issues/9981
Hello, Airflow has the ability to define what a cat created with KubernetesExecutor will look like in two ways: - with many configuration options. For example, to configure init container with git sync you can use options: `git_repo`, `git_branch`, `git_sync_depth`, `git_subpath`, `git_sync_rev`, `git_user`, `git_password`, `git_sync_root`, `git_sync_dest`, `git_dags_folder_mount_point`, `git_ssh_key_secret_name`, `git_ssh_known_hosts_configmap_name`, `git_sync_credentials_secret`, `git_sync_container_repository`, `git_sync_container_tag`, `git_sync_init_container_name`, `git_sync_run_as_user` - with pod_template_file. Path to the YAML pod file. If set, all other kubernetes-related fields are ignored. To address this problem, we added a more elastic elastic solution - `pod_template_file` options. This allows us to use all Kubernetes features without changing the Airflow code. For example, we can add a sidecar container with git sync based on official documentation. However, it is problematic to pass this file from Helm. We have to run the helm twice. First time to generate pod_template_file, then the second time to generate a configuration for Airflow. It's caused by **inconsistency between ecosystems.** We can keep the pod template in the [Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/). In Kubernetes, this approach is common. It has a lot of [PodTemplate](https://kubernetes.io/docs/concepts/workloads/pods/pod-overview/#pod-templates) build-in in the core e.g. [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/), [Jobs](https://kubernetes.io/docs/concepts/jobs/run-to-completion-finite-workloads/), [DeamonSets](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/). API Reference for PodTemplateSpec: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podtemplatespec-v1-core An additional benefit will be the ability to change the configuration without restarting the scheduler. We can automation that watches for updates on the new resource version. An example template might look like below ```yaml apiVersion: "airflow.apache.orgv1" kind: WorkerTemplate metadata: name: main-airflow-tempalte template: metadata: labels: app: nginx spec: containers: - name: git-sync-test image: polidea-airflow.gcr.io/airfllow-main:v2.1.0 volumeMounts: - name: service mountPath: /var/magic initContainers: - name: git-sync image: k8s.gcr.io/git-sync-amd64:v2.0.6 imagePullPolicy: Always volumeMounts: - name: service mountPath: /magic - name: git-secret mountPath: /etc/git-secret env: - name: GIT_SYNC_REPO value: <repo-path-you-want-to-clone> - name: GIT_SYNC_BRANCH value: <repo-branch> - name: GIT_SYNC_ROOT value: /magic - name: GIT_SYNC_DEST value: <path-where-you-want-to-clone> - name: GIT_SYNC_PERMISSIONS value: "0777" - name: GIT_SYNC_ONE_TIME value: "true" - name: GIT_SYNC_SSH value: "true" securityContext: runAsUser: 0 volumes: - name: service emptyDir: {} - name: git-secret secret: defaultMode: 256 secretName: git-creds ``` What do you think about this approach? Is this the direction Airflow should go? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
