bpleines opened a new issue #11789:
URL: https://github.com/apache/airflow/issues/11789
<!--
-->
<!--
IMPORTANT!!!
PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE
NEXT TO "SUBMIT NEW ISSUE" BUTTON!!!
PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!!
Please complete the next sections or the issue will be closed.
These questions are the first thing we need to know to understand the
context.
-->
**Apache Airflow version**: 1.10.12
**Kubernetes version (if you are using kubernetes)** (use `kubectl
version`): `1.18.8`
**Environment**:
- **Cloud provider or hardware configuration**:
- **OS** (e.g. from /etc/os-release):
- **Kernel** (e.g. `uname -a`):
- **Install tools**:
- **Others**:
**What happened**:
The included `git-sync` example has a few issues that if addressed may aid
adoption of the powerful config option.
1. It does not properly share the kubernetes volume containing dags
2. omits the `GIT_SYNC_ONE_TIME` option which is necessary for the
initContainer to exit after syncing.
3. `GIT_SYNC_WAIT` is not applicable because the initContainer should exit
immediately after syncing.
**What you expected to happen**:
The `git-sync` initContainer syncs a dag repository to the shared k8s volume
and then exits. The shared k8s volume `airflow-dags` is then consumed by the
airflow worker pod. Lastly, because the `git-sync` container always syncs a
repo inside a nested directory, force the naming of that destination directory
to be `dags` and mount it one directory level up onto the airflow worker pod.
```
apiVersion: v1
kind: Pod
metadata:
name: dummy-name
spec:
initContainers:
- name: git-sync
image: "k8s.gcr.io/git-sync:v3.1.6"
env:
- name: GIT_SYNC_REV
value: "HEAD"
- name: GIT_SYNC_BRANCH
value: "v1-10-stable"
- name: GIT_SYNC_REPO
value: "https://github.com/apache/airflow.git"
- name: GIT_SYNC_DEPTH
value: "1"
- name: GIT_SYNC_ROOT
value: "/git"
- name: GIT_SYNC_DEST
value: "dags"
- name: GIT_SYNC_ADD_USER
value: "true"
- name: GIT_SYNC_ONE_TIME
value: true
- name: GIT_SYNC_MAX_SYNC_FAILURES
value: "0"
volumeMounts:
- name: airflow-dags
mountPath: /git
containers:
- args: []
command: []
env:
- name: AIRFLOW__CORE__EXECUTOR
value: LocalExecutor
# Hard Coded Airflow Envs
- name: AIRFLOW__CORE__FERNET_KEY
valueFrom:
secretKeyRef:
name: RELEASE-NAME-fernet-key
key: fernet-key
- name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
name: RELEASE-NAME-airflow-metadata
key: connection
- name: AIRFLOW_CONN_AIRFLOW_DB
valueFrom:
secretKeyRef:
name: RELEASE-NAME-airflow-metadata
key: connection
envFrom: []
image: dummy_image
imagePullPolicy: IfNotPresent
name: base
ports: []
volumeMounts:
- mountPath: "/opt/airflow/logs"
name: airflow-logs
- mountPath: "/opt/airflow"
name: airflow-dags
readOnly: false
hostNetwork: false
restartPolicy: Never
securityContext:
runAsUser: 50000
nodeSelector:
{}
affinity:
{}
tolerations:
[]
serviceAccountName: 'RELEASE-NAME-worker-serviceaccount'
volumes:
- name: airflow-dags
emptyDir: {}
- emptyDir: {}
name: airflow-logs
- configMap:
name: RELEASE-NAME-airflow-config
name: airflow-config
- configMap:
name: RELEASE-NAME-airflow-config
name: airflow-local-settings
```
**How to reproduce it**:
Try to use the existing [`git-sync` template
](https://github.com/astronomer/airflow/blob/master/airflow/kubernetes/pod_template_file_examples/git_sync_template.yaml)
**Anything else we need to know**:
The latest git-sync container is now version v3.2.0 and can be pulled at
`k8s.gcr.io/git-sync/git-sync:v3.2.0`.
In my experience only 4 `GIT_SYNC_*` environment variables are needed when
pulling a dags repo from a public git repository. We have to assume that the
dags are present in the top-level directory of the dags repo otherwise mounting
them to the worker pod requires a custom path.
```
- name: GIT_SYNC_BRANCH
value: "master"
- name: GIT_SYNC_REPO
value: "https://github.com/bpleines/airflow-dags"
- name: GIT_SYNC_DEST
value: "dags"
- name: GIT_SYNC_ONE_TIME
value: "true"
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]