Deebol opened a new issue, #61396:
URL: https://github.com/apache/airflow/issues/61396
### Apache Airflow version
3.1.6
### If "Other Airflow 3 version" selected, which one?
3.1.5
### What happened?
Hi.
In Airflow 3.1.6 (and 3.1.5, issue doesn't exist on 3.1.3), GitDagBundle
with LocalExecutor performs a full git clone into a new directory (new Inode)
and deletes the old one almost every time a task starts.
This "Inode flipping" causes running tasks (e.g.: dbt via Cosmos) to lose
their file descriptors to the DAG folder, resulting in FileNotFoundError or
Directory not found errors.
I executed such script inside airflow scheduler pod to detect ongoing git
operations:
```
python - <<'PY'
import os, time, subprocess
from datetime import datetime, timezone
TARGET = "/tmp/airflow/dag_bundles/dh-pipeline-dags/versions/<provide sha>"
INTERVAL = 0.05
def now():
return datetime.now(timezone.utc).strftime("%H:%M:%S.%f")[:-3]
seen_pids = set()
print(f"{now()} START monitoring GIT processes...")
while True:
try:
pids = [pid for pid in os.listdir('/proc') if pid.isdigit()]
for pid in pids:
if pid not in seen_pids:
try:
with open(f"/proc/{pid}/cmdline", "rb") as f:
cmd = f.read().replace(b"\x00", b" ").decode("utf-8",
"ignore").strip()
if "git" in cmd.lower():
with open(f"/proc/{pid}/stat", "rb") as f:
stat_parts = f.read().split()
ppid = stat_parts[3].decode()
inode_info = "N/A"
if os.path.exists(TARGET):
try:
inode_info = os.stat(TARGET).st_ino
except: pass
print(f"{now()} GIT DETECTED! PID={pid} PPID={ppid}
INODE_BASE={inode_info}")
print(f" CMD: {cmd[:150]}")
seen_pids.add(pid)
except (FileNotFoundError, ProcessLookupError):
continue
except Exception as e:
print(f"Error: {e}")
if len(seen_pids) > 1000:
seen_pids.clear()
time.sleep(INTERVAL)
PY
```
And this is what I got on 3.1.6:
```
...
13:33:06.182 GIT DETECTED! PID=16645 PPID=16640 INODE_BASE=139948383
CMD: /usr/lib/git-core/git rev-list --objects --stdin --not --all --quiet
--alternate-refs
13:33:06.337 GIT DETECTED! PID=16648 PPID=16637 INODE_BASE=406416615
CMD: git clone -v -- /tmp/airflow/dag_bundles/dh-pipeline-dags/bare
/tmp/airflow/dag_bundles/dh-pipeline-dags/versions/1108133dd6b2afb806d3531e9a97f8b15994
13:33:06.337 GIT DETECTED! PID=16649 PPID=16648 INODE_BASE=406416615
CMD: /bin/sh -c git-upload-pack
'/tmp/airflow/dag_bundles/dh-pipeline-dags/bare' git-upload-pack
'/tmp/airflow/dag_bundles/dh-pipeline-dags/bare'
13:33:06.337 GIT DETECTED! PID=16650 PPID=16649 INODE_BASE=406416615
CMD: git-upload-pack /tmp/airflow/dag_bundles/dh-pipeline-dags/bare
13:33:06.491 GIT DETECTED! PID=16652 PPID=16637 INODE_BASE=406416615
CMD: git cat-file --batch-check
13:33:06.491 GIT DETECTED! PID=16653 PPID=16637 INODE_BASE=406416615
CMD: git reset --hard HEAD --
13:33:09.825 GIT DETECTED! PID=16658 PPID=16655 INODE_BASE=406416615
CMD: git fetch -v -- origin +refs/heads/*:refs/heads/*
+refs/tags/*:refs/tags/*
13:33:10.339 GIT DETECTED! PID=16666 PPID=16655 INODE_BASE=473240196
CMD: git clone -v -- /tmp/airflow/dag_bundles/dh-pipeline-dags/bare
/tmp/airflow/dag_bundles/dh-pipeline-dags/versions/1108133dd6b2afb806d3531e9a97f8b15994
13:33:10.339 GIT DETECTED! PID=16667 PPID=16666 INODE_BASE=473240196
CMD: /bin/sh -c git-upload-pack
'/tmp/airflow/dag_bundles/dh-pipeline-dags/bare' git-upload-pack
'/tmp/airflow/dag_bundles/dh-pipeline-dags/bare'
13:33:10.339 GIT DETECTED! PID=16668 PPID=16667 INODE_BASE=473240196
CMD: git-upload-pack /tmp/airflow/dag_bundles/dh-pipeline-dags/bare
13:33:10.441 GIT DETECTED! PID=16669 PPID=16655 INODE_BASE=473240196
CMD: git checkout qa/deployed
...
```
git clone is happening and INODE_BASE is changing.
On 3.1.3 I got:
```
...
13:16:45.591 GIT DETECTED! PID=354362 PPID=354358 INODE_BASE=14307405
CMD: git checkout dev/deployed
13:16:46.580 GIT DETECTED! PID=354373 PPID=354366 INODE_BASE=14307405
CMD: git cat-file --batch-check
13:16:49.630 GIT DETECTED! PID=354397 PPID=354393 INODE_BASE=14307405
CMD: git checkout dev/deployed
13:16:49.990 GIT DETECTED! PID=354405 PPID=354400 INODE_BASE=14307405
CMD: git cat-file --batch-check
13:16:49.990 GIT DETECTED! PID=354406 PPID=354400 INODE_BASE=14307405
CMD: git reset --hard HEAD --
13:17:01.411 GIT DETECTED! PID=354439 PPID=354434 INODE_BASE=14307405
CMD: git cat-file --batch-check
13:17:01.411 GIT DETECTED! PID=354440 PPID=354434 INODE_BASE=14307405
CMD: git reset --hard HEAD --
13:17:04.472 GIT DETECTED! PID=354443 PPID=354441 INODE_BASE=14307405
CMD: git version
...
```
No git-clone's and INODE_BASE is not changing.
### What you think should happen instead?
_No response_
### How to reproduce
1. Setup Airflow 3.1.6 with LocalExecutor.
2. Configure a GitDagBundle
3. Run multiple tasks that reads files from the bundle directory (like a
DbtSelectOperator from Cosmos).
### Operating System
Debian GNU/Linux 12 (bookworm)
### Versions of Apache Airflow Providers
apache-airflow==3.1.6
apache-airflow-core==3.1.6
apache-airflow-providers-airbyte==5.3.1
apache-airflow-providers-amazon==9.19.0
apache-airflow-providers-common-compat==1.11.0
apache-airflow-providers-common-io==1.7.0
apache-airflow-providers-common-sql==1.30.2
apache-airflow-providers-datadog==3.10.1
apache-airflow-providers-http==5.6.2
apache-airflow-providers-microsoft-azure==12.10.1
apache-airflow-providers-mongo==5.3.1
apache-airflow-providers-smtp==2.4.1
apache-airflow-providers-standard==1.10.2
apache-airflow-task-sdk==1.1.6
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
```
- name: AIRFLOW__DAG_PROCESSOR__DEFAULT_BUNDLE_NAME
value: "dh-pipeline-dags"
- name: AIRFLOW__DAG_PROCESSOR__DAG_BUNDLE_CONFIG_LIST
value: >
[
{
"name": "dh-pipeline-dags",
"classpath": "airflow.providers.git.bundles.git.GitDagBundle",
"kwargs": {
"tracking_ref": "qa/deployed",
"subdir": "dags",
"git_conn_id": "git_xc_dh_pipeline",
"refresh_interval": 300
}
}
]
```
### Anything else?
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]