GitHub user uplsh580 edited a discussion: [Question] Bundle-specific Python
Path Isolation in Airflow 3.x Git Bundles
### [Question] Bundle-specific Python Path Isolation in Airflow 3.x Git Bundles
**Environment:**
- Airflow Version: 3.1.7
- Deployment: Git Bundles
- Setup: Multi-tenant environment where each Git Repository (Bundle) belongs to
a specific team.
---
**Context (Path Architecture):**
Our infrastructure deploys code into different paths depending on the Airflow
component:
1. **DAG Processor:**
`{BUNDLE_ROOT}/{bundle_name}/tracking_repo/{user_code_root}/`
2. **Worker:**
`{BUNDLE_ROOT}/{bundle_name}/version/{commit_id}/{user_code_root}/`
**The Directory Structure (Inside `{user_code_root}`):**
```text
{user_code_root}/
├── airflow_lib/ # Internal library (Shared across DAGs in the same
bundle)
│ ├── __init__.py
│ ├── constants/
│ └── util/
├── dags/ # Actual DAG files
│ ├── my_dag.py
│ └── sub_dir/
│ └── nested_dag.py
└── README.md
```
**The Problem:**
When a DAG (e.g., `my_dag.py`) is parsed by the DAG Processor or executed by
the Worker, it fails with `ModuleNotFoundError: No module named 'airflow_lib'`.
This is because the `{user_code_root}` directory—which contains the
`airflow_lib` package—is not automatically added to `sys.path`. Since the path
is dynamic (varies by `commit_id` on Workers) and inconsistent (differs between
Processor and Worker), we cannot use a static global `PYTHONPATH`.
**Key Challenges:**
1. **Component Path Discrepancy**: Any solution must work for both the
`tracking_repo` path on the Processor and the `version/{commit_id}` path on the
Worker.
2. **Namespace Collisions**: Multiple bundles (teams) might have an
`airflow_lib` folder. Adding all bundle roots to `sys.path` would cause
collision and version "cross-talk."
3. **No Manual Code Changes**: We want to avoid forcing hundreds of developers
to add `sys.path.append` logic to every DAG file.
**Questions:**
1. Is there a way to configure Airflow 3.x to automatically recognize the
Bundle's specific root (`{user_code_root}`) as a Python source root during
parsing and execution?
2. Are there any hooks or listeners (like `on_task_instance_run` or DAG
policies) that are recommended for injecting component-specific paths
dynamically?
3. How should we handle the fact that the import root changes between the DAG
Processor (tracking) and Worker (versioned) while maintaining the same import
statement `from airflow_lib import ...`?
We are looking for a clean, infrastructure-level solution that aligns with the
AIP-66 design philosophy. Thank you!
GitHub link: https://github.com/apache/airflow/discussions/61901
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]