tan-morgan-kim commented on issue #42542:
URL: https://github.com/apache/airflow/issues/42542#issuecomment-4203492690

   We ran into this on Airflow 2.9.2 with git-sync v4 on EKS (official Helm 
chart).
   
   **Root cause**: `DagFileProcessorManager.get_dag_directory()` calls 
`Path.resolve()`, which resolves the git-sync worktree symlink (`repo -> 
.worktrees/{commit-hash}`). Since the hash changes on every sync, 
`processor_subdir` stored in the DB differs from the current resolved path, so 
`deactivate_deleted_dags()` never matches orphan DAGs.
   
   **Evidence from running cluster**:
   - Active DAG: `processor_subdir='/opt/airflow/dags/.worktrees/e272882...'` 
(current hash)
   - Orphan DAG: `processor_subdir='/opt/airflow/dags/.worktrees/d5ecf60c...'` 
(old hash)
   - Both have `fileloc='/opt/airflow/dags/repo/dags/...'` (unresolved, 
consistent)
   
   **Workaround**: We patched the Docker image with `sed` to remove 
`.resolve()`, matching the approach in PR #46877:
   ```dockerfile
   RUN sed -i 's/return str(self._dag_directory.resolve())/return 
str(self._dag_directory)/' \
       
/home/airflow/.local/lib/python3.12/site-packages/airflow/dag_processing/manager.py
 \
       && grep -q 'return str(self._dag_directory)' \
       
/home/airflow/.local/lib/python3.12/site-packages/airflow/dag_processing/manager.py
   ```
   
   After applying the patch + normalizing existing `processor_subdir` values in 
the DB, all 21 orphan DAGs were correctly deactivated on the next scheduler 
parse cycle.
   
   Note: You also need to normalize existing DB records after applying the 
patch:
   ```sql
   UPDATE dag SET processor_subdir = '/opt/airflow/dags/repo'
   WHERE processor_subdir LIKE '/opt/airflow/dags/.worktrees/%';
   ```
   
   This is specific to git-sync's worktree symlink structure. Non-symlink 
setups (direct volume mounts, baked-in DAGs) are unaffected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to