bpleines opened a new issue #11789:
URL: https://github.com/apache/airflow/issues/11789


   <!--
   
   -->
   
   <!--
   
   IMPORTANT!!!
   
   PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE
   NEXT TO "SUBMIT NEW ISSUE" BUTTON!!!
   
   PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!!
   
   Please complete the next sections or the issue will be closed.
   These questions are the first thing we need to know to understand the 
context.
   
   -->
   
   **Apache Airflow version**: 1.10.12
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl 
version`): `1.18.8`
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**:
   - **OS** (e.g. from /etc/os-release):
   - **Kernel** (e.g. `uname -a`):
   - **Install tools**:
   - **Others**:
   
   **What happened**:
   
   The included `git-sync` example has a few issues that if addressed may aid 
adoption of the powerful config option.
   1. It does not properly share the kubernetes volume containing dags
   2. omits the `GIT_SYNC_ONE_TIME` option which is necessary for the 
initContainer to exit after syncing.
   3. `GIT_SYNC_WAIT` is not applicable because the initContainer should exit 
immediately after syncing.
   
   **What you expected to happen**:
   
   The `git-sync` initContainer syncs a dag repository to the shared k8s volume 
and then exits. The shared k8s volume `airflow-dags` is then consumed by the 
airflow worker pod. Lastly, because the `git-sync` container always syncs a 
repo inside a nested directory, force the naming of that destination directory 
to be `dags` and mount it one directory level up onto the airflow worker pod.
   
   ```
   apiVersion: v1
   kind: Pod
   metadata:
     name: dummy-name
   spec:
     initContainers:
       - name: git-sync
         image: "k8s.gcr.io/git-sync:v3.1.6"
         env:
           - name: GIT_SYNC_REV
             value: "HEAD"
           - name: GIT_SYNC_BRANCH
             value: "v1-10-stable"
           - name: GIT_SYNC_REPO
             value: "https://github.com/apache/airflow.git";
           - name: GIT_SYNC_DEPTH
             value: "1"
           - name: GIT_SYNC_ROOT
             value: "/git"
           - name: GIT_SYNC_DEST
             value: "dags"
           - name: GIT_SYNC_ADD_USER
             value: "true"                                                      
                                                                                
                                                                                
          
           - name: GIT_SYNC_ONE_TIME
             value: true
           - name: GIT_SYNC_MAX_SYNC_FAILURES
             value: "0"
         volumeMounts:
           - name: airflow-dags
             mountPath: /git
     containers:
       - args: []
         command: []
         env:
           - name: AIRFLOW__CORE__EXECUTOR
             value: LocalExecutor
           # Hard Coded Airflow Envs
           - name: AIRFLOW__CORE__FERNET_KEY
             valueFrom:
               secretKeyRef:
                 name: RELEASE-NAME-fernet-key
                 key: fernet-key
           - name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
             valueFrom:
               secretKeyRef:
                 name: RELEASE-NAME-airflow-metadata
                 key: connection
           - name: AIRFLOW_CONN_AIRFLOW_DB
             valueFrom:
               secretKeyRef:
                 name: RELEASE-NAME-airflow-metadata
                 key: connection
         envFrom: []
         image: dummy_image
         imagePullPolicy: IfNotPresent
         name: base
         ports: []
         volumeMounts:
           - mountPath: "/opt/airflow/logs"
             name: airflow-logs
           - mountPath: "/opt/airflow"
             name: airflow-dags
             readOnly: false
     hostNetwork: false
     restartPolicy: Never
     securityContext:
       runAsUser: 50000
     nodeSelector:
       {}
     affinity:
       {}
     tolerations:
       []
     serviceAccountName: 'RELEASE-NAME-worker-serviceaccount'
     volumes:
       - name: airflow-dags
         emptyDir: {}
       - emptyDir: {}
         name: airflow-logs
       - configMap:
           name: RELEASE-NAME-airflow-config
         name: airflow-config
       - configMap:
           name: RELEASE-NAME-airflow-config
         name: airflow-local-settings
   ```
   
   **How to reproduce it**:
   
   Try to use the existing [`git-sync` template 
](https://github.com/astronomer/airflow/blob/master/airflow/kubernetes/pod_template_file_examples/git_sync_template.yaml)
   
   **Anything else we need to know**:
   
   The latest git-sync container is now version v3.2.0 and can be pulled at 
`k8s.gcr.io/git-sync/git-sync:v3.2.0`.
   
   In my experience only 4 `GIT_SYNC_*` environment variables are needed when 
pulling a dags repo from a public git repository. We have to assume that the 
dags are present in the top-level directory of the dags repo otherwise mounting 
them to the worker pod requires a custom path.
   ```
   - name: GIT_SYNC_BRANCH
      value: "master"
   - name: GIT_SYNC_REPO
      value: "https://github.com/bpleines/airflow-dags";
   - name:  GIT_SYNC_DEST
      value: "dags"
   - name:  GIT_SYNC_ONE_TIME
      value: "true"
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to