mik-laj opened a new issue #9981:
URL: https://github.com/apache/airflow/issues/9981


   Hello,
   
   Airflow has the ability to define what a cat created with KubernetesExecutor 
will look like in two ways:
   - with many configuration options. For example, to configure init container 
with git sync you can use options: `git_repo`, `git_branch`, `git_sync_depth`, 
`git_subpath`, `git_sync_rev`, `git_user`, `git_password`, `git_sync_root`, 
`git_sync_dest`, `git_dags_folder_mount_point`, `git_ssh_key_secret_name`, 
`git_ssh_known_hosts_configmap_name`, `git_sync_credentials_secret`, 
`git_sync_container_repository`, `git_sync_container_tag`, 
`git_sync_init_container_name`, `git_sync_run_as_user` 
   - with pod_template_file. Path to the YAML pod file. If set, all other 
kubernetes-related fields are ignored.
   
   To address this problem, we added a more elastic elastic solution  - 
`pod_template_file` options. This allows us to use all Kubernetes features 
without changing the Airflow code.  For example, we can add a sidecar container 
with git sync based on official documentation. 
   
   However, it is problematic to pass this file from Helm. We have to run the 
helm twice. First time to generate pod_template_file, then the second time to 
generate a configuration for Airflow.  It's caused by **inconsistency between 
ecosystems.**
   
   We can keep the pod template in the [Custom 
Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/).
 In Kubernetes, this approach is common. It has a lot of 
[PodTemplate](https://kubernetes.io/docs/concepts/workloads/pods/pod-overview/#pod-templates)
  build-in in the core e.g. 
[Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/),
 
[Jobs](https://kubernetes.io/docs/concepts/jobs/run-to-completion-finite-workloads/),
 
[DeamonSets](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/).
   API Reference for PodTemplateSpec: 
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podtemplatespec-v1-core
   
   An additional benefit will be the ability to change the configuration 
without restarting the scheduler. We can automation that watches for updates on 
the new resource version. 
   
   An example template might look like below
   ```yaml
   apiVersion: "airflow.apache.orgv1"
   kind: WorkerTemplate
   metadata:
     name: main-airflow-tempalte
   template:
        metadata:
          labels:
            app: nginx
       spec:
         containers:
         - name: git-sync-test
           image: polidea-airflow.gcr.io/airfllow-main:v2.1.0
           volumeMounts:
           - name: service
             mountPath: /var/magic
         initContainers:
         - name: git-sync
           image: k8s.gcr.io/git-sync-amd64:v2.0.6
           imagePullPolicy: Always
           volumeMounts:
           - name: service
             mountPath: /magic
           - name: git-secret
             mountPath: /etc/git-secret
           env:
           - name: GIT_SYNC_REPO
             value: <repo-path-you-want-to-clone>
           - name: GIT_SYNC_BRANCH
             value: <repo-branch>
           - name: GIT_SYNC_ROOT
             value: /magic
           - name: GIT_SYNC_DEST
             value: <path-where-you-want-to-clone>
           - name: GIT_SYNC_PERMISSIONS
             value: "0777"
           - name: GIT_SYNC_ONE_TIME
             value: "true"
           - name: GIT_SYNC_SSH
             value: "true"
           securityContext:
             runAsUser: 0
         volumes:
         - name: service
           emptyDir: {}
         - name: git-secret
           secret:
             defaultMode: 256
             secretName: git-creds
   ```
   
   What do you think about this approach? Is this the direction Airflow should 
go?
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to