Deebol opened a new issue, #61396:
URL: https://github.com/apache/airflow/issues/61396

   ### Apache Airflow version
   
   3.1.6
   
   ### If "Other Airflow 3 version" selected, which one?
   
   3.1.5
   
   ### What happened?
   
   Hi.
   
   In Airflow 3.1.6 (and 3.1.5, issue doesn't exist on 3.1.3), GitDagBundle 
with LocalExecutor performs a full git clone into a new directory (new Inode) 
and deletes the old one almost every time a task starts.
   
   This "Inode flipping" causes running tasks (e.g.: dbt via Cosmos) to lose 
their file descriptors to the DAG folder, resulting in FileNotFoundError or 
Directory not found errors.
   
   I executed such script inside airflow scheduler pod to detect ongoing git 
operations:
   ```
   python - <<'PY'
   import os, time, subprocess
   from datetime import datetime, timezone
   
   TARGET = "/tmp/airflow/dag_bundles/dh-pipeline-dags/versions/<provide sha>"
   INTERVAL = 0.05
   
   def now():
     return datetime.now(timezone.utc).strftime("%H:%M:%S.%f")[:-3]
   
   seen_pids = set()
   
   print(f"{now()} START monitoring GIT processes...")
   
   while True:
     try:
       pids = [pid for pid in os.listdir('/proc') if pid.isdigit()]
       for pid in pids:
         if pid not in seen_pids:
           try:
             with open(f"/proc/{pid}/cmdline", "rb") as f:
               cmd = f.read().replace(b"\x00", b" ").decode("utf-8", 
"ignore").strip()
   
             if "git" in cmd.lower():
               with open(f"/proc/{pid}/stat", "rb") as f:
                 stat_parts = f.read().split()
                 ppid = stat_parts[3].decode()
   
               inode_info = "N/A"
               if os.path.exists(TARGET):
                 try:
                   inode_info = os.stat(TARGET).st_ino
                 except: pass
   
               print(f"{now()} GIT DETECTED! PID={pid} PPID={ppid} 
INODE_BASE={inode_info}")
               print(f"   CMD: {cmd[:150]}")
   
               seen_pids.add(pid)
           except (FileNotFoundError, ProcessLookupError):
             continue
     except Exception as e:
       print(f"Error: {e}")
   
     if len(seen_pids) > 1000:
       seen_pids.clear()
   
     time.sleep(INTERVAL)
   PY
   
   ```
   And this is what I got on 3.1.6:
   ```
   ...
   13:33:06.182 GIT DETECTED! PID=16645 PPID=16640 INODE_BASE=139948383
      CMD: /usr/lib/git-core/git rev-list --objects --stdin --not --all --quiet 
--alternate-refs
   13:33:06.337 GIT DETECTED! PID=16648 PPID=16637 INODE_BASE=406416615
      CMD: git clone -v -- /tmp/airflow/dag_bundles/dh-pipeline-dags/bare 
/tmp/airflow/dag_bundles/dh-pipeline-dags/versions/1108133dd6b2afb806d3531e9a97f8b15994
   13:33:06.337 GIT DETECTED! PID=16649 PPID=16648 INODE_BASE=406416615
      CMD: /bin/sh -c git-upload-pack 
'/tmp/airflow/dag_bundles/dh-pipeline-dags/bare' git-upload-pack 
'/tmp/airflow/dag_bundles/dh-pipeline-dags/bare'
   13:33:06.337 GIT DETECTED! PID=16650 PPID=16649 INODE_BASE=406416615
      CMD: git-upload-pack /tmp/airflow/dag_bundles/dh-pipeline-dags/bare
   13:33:06.491 GIT DETECTED! PID=16652 PPID=16637 INODE_BASE=406416615
      CMD: git cat-file --batch-check
   13:33:06.491 GIT DETECTED! PID=16653 PPID=16637 INODE_BASE=406416615
      CMD: git reset --hard HEAD --
   13:33:09.825 GIT DETECTED! PID=16658 PPID=16655 INODE_BASE=406416615
      CMD: git fetch -v -- origin +refs/heads/*:refs/heads/* 
+refs/tags/*:refs/tags/*
   13:33:10.339 GIT DETECTED! PID=16666 PPID=16655 INODE_BASE=473240196
      CMD: git clone -v -- /tmp/airflow/dag_bundles/dh-pipeline-dags/bare 
/tmp/airflow/dag_bundles/dh-pipeline-dags/versions/1108133dd6b2afb806d3531e9a97f8b15994
   13:33:10.339 GIT DETECTED! PID=16667 PPID=16666 INODE_BASE=473240196
      CMD: /bin/sh -c git-upload-pack 
'/tmp/airflow/dag_bundles/dh-pipeline-dags/bare' git-upload-pack 
'/tmp/airflow/dag_bundles/dh-pipeline-dags/bare'
   13:33:10.339 GIT DETECTED! PID=16668 PPID=16667 INODE_BASE=473240196
      CMD: git-upload-pack /tmp/airflow/dag_bundles/dh-pipeline-dags/bare
   13:33:10.441 GIT DETECTED! PID=16669 PPID=16655 INODE_BASE=473240196
      CMD: git checkout qa/deployed
   ...
   ```
   
   git clone is happening and INODE_BASE is changing.
   
   On 3.1.3 I got:
   ```
   ...
   13:16:45.591 GIT DETECTED! PID=354362 PPID=354358 INODE_BASE=14307405
      CMD: git checkout dev/deployed
   13:16:46.580 GIT DETECTED! PID=354373 PPID=354366 INODE_BASE=14307405
      CMD: git cat-file --batch-check
   13:16:49.630 GIT DETECTED! PID=354397 PPID=354393 INODE_BASE=14307405
      CMD: git checkout dev/deployed
   13:16:49.990 GIT DETECTED! PID=354405 PPID=354400 INODE_BASE=14307405
      CMD: git cat-file --batch-check
   13:16:49.990 GIT DETECTED! PID=354406 PPID=354400 INODE_BASE=14307405
      CMD: git reset --hard HEAD --
   13:17:01.411 GIT DETECTED! PID=354439 PPID=354434 INODE_BASE=14307405
      CMD: git cat-file --batch-check
   13:17:01.411 GIT DETECTED! PID=354440 PPID=354434 INODE_BASE=14307405
      CMD: git reset --hard HEAD --
   13:17:04.472 GIT DETECTED! PID=354443 PPID=354441 INODE_BASE=14307405
      CMD: git version
   ...
   ```
   
   No git-clone's and INODE_BASE is not changing.
   
   ### What you think should happen instead?
   
   _No response_
   
   ### How to reproduce
   
   1. Setup Airflow 3.1.6 with LocalExecutor.
   2. Configure a GitDagBundle
   3. Run multiple tasks that reads files from the bundle directory (like a 
DbtSelectOperator from Cosmos).
   
   ### Operating System
   
   Debian GNU/Linux 12 (bookworm)
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow==3.1.6
   apache-airflow-core==3.1.6
   apache-airflow-providers-airbyte==5.3.1
   apache-airflow-providers-amazon==9.19.0
   apache-airflow-providers-common-compat==1.11.0
   apache-airflow-providers-common-io==1.7.0
   apache-airflow-providers-common-sql==1.30.2
   apache-airflow-providers-datadog==3.10.1
   apache-airflow-providers-http==5.6.2
   apache-airflow-providers-microsoft-azure==12.10.1
   apache-airflow-providers-mongo==5.3.1
   apache-airflow-providers-smtp==2.4.1
   apache-airflow-providers-standard==1.10.2
   apache-airflow-task-sdk==1.1.6
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   ```
       - name: AIRFLOW__DAG_PROCESSOR__DEFAULT_BUNDLE_NAME
         value: "dh-pipeline-dags"
       - name: AIRFLOW__DAG_PROCESSOR__DAG_BUNDLE_CONFIG_LIST
         value: >
           [
             {
               "name": "dh-pipeline-dags",
               "classpath": "airflow.providers.git.bundles.git.GitDagBundle",
               "kwargs": {
                 "tracking_ref": "qa/deployed",
                 "subdir": "dags",
                 "git_conn_id": "git_xc_dh_pipeline",
                 "refresh_interval": 300
               }
             }
           ]
   ```
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to