pengchen created AIRFLOW-2642:
---------------------------------
Summary: [kubernetes executor worker] the value of git-sync init
container ENV GIT_SYNC_ROOT is wrong
Key: AIRFLOW-2642
URL: https://issues.apache.org/jira/browse/AIRFLOW-2642
Project: Apache Airflow
Issue Type: Bug
Components: contrib
Affects Versions: 2.0.0, 1.10
Reporter: pengchen
Assignee: pengchen
Fix For: 1.10
There are two way of syncing dags, pvc and git-sync. When we use git-sync this
way, the generated worker pod yaml file fragment is as follows
{code:java}
worker container:
-------------------------------
containers:
- args:
- airflow run tutorial1 print_date 2018-06-19T07:57:15.011693+00:00 --local -sd
/root/airflow/dags/dags/example_dags/tutorial1.py
command:
- bash
- -cx
- --
env:
- name: AIRFLOW__CORE__AIRFLOW_HOME
value: /root/airflow
- name: AIRFLOW__CORE__EXECUTOR
value: LocalExecutor
- name: AIRFLOW__CORE__DAGS_FOLDER
value: /tmp/dags
- name: SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
key: sql_alchemy_conn
name: airflow-secrets
init container:
-------------------------------
initContainers:
- env:
- name: GIT_SYNC_REPO
value: https://code.devops.xiaohongshu.com/pengchen/Airflow-DAGs.git
- name: GIT_SYNC_BRANCH
value: master
- name: GIT_SYNC_ROOT
value: /tmp
- name: GIT_SYNC_DEST
value: dags
- name: GIT_SYNC_ONE_TIME
value: "true"
- name: GIT_SYNC_USERNAME
value: XXX
- name: GIT_SYNC_PASSWORD
value: XXX
image: library/git-sync-amd64:v2.0.5
imagePullPolicy: IfNotPresent
name: git-sync-clone
resources: {}
securityContext:
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /root/airflow/dags/
name: airflow-dags
- mountPath: /root/airflow/logs
name: airflow-logs
- mountPath: /root/airflow/airflow.cfg
name: airflow-config
readOnly: true
subPath: airflow.cfg
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-xz87t
readOnly: true
{code}
According to the configuration, git-sync will synchronize dags to /tmp/dags
directory. However the worker container command args(airflow run tutorial1
print_date 2018-06-19T07:57:15.011693+00:00 --local -sd
/root/airflow/dags/dags/example_dags/tutorial1.py) are generated by the
scheduler. Therefore, the task error is as follows
{code:java}
+ airflow run tutorial1 print_date 2018-06-19T07:57:15.011693+00:00 --local -sd
/root/airflow/dags/dags/example_dags/tutorial1.py
[2018-06-19 07:57:29,075] {settings.py:174} INFO - setting.configure_orm():
Using pool settings. pool_size=5, pool_recycle=1800
[2018-06-19 07:57:29,232] {__init__.py:51} INFO - Using executor LocalExecutor
[2018-06-19 07:57:29,373] {models.py:219} INFO - Filling up the DagBag from
/root/airflow/dags/dags/example_dags/tutorial1.py
[2018-06-19 07:57:29,648] {models.py:310} INFO - File
/usr/local/lib/python2.7/dist-packages/airflow/example_dags/__init__.py assumed
to contain no DAGs. Skipping.
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 32, in <module>
args.func(args)
File "/usr/local/lib/python2.7/dist-packages/airflow/utils/cli.py", line 74, in
wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 475, in
run
dag = get_dag(args)
File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 146, in
get_dag
'parse.'.format(args.dag_id))
airflow.exceptions.AirflowException: dag_id could not be found: tutorial1.
Either the dag did not exist or it failed to parse.
{code}
The log shows that the worker cannot find the corresponding dag, so I think the
environment variable GIT_SYNC_ROOT should be consistent with
dag_volume_mount_path.
The worker's environment variable AIRFLOW__CORE__DAGS_FOLDER is invalid
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)