thedavidnovak opened a new issue, #63548: URL: https://github.com/apache/airflow/issues/63548
### What do you see as an issue? The [Modules Management](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/modules_management.html#modules-management) documentation describes three "Built-in PYTHONPATH entries in Airflow" — `dags`, `config`, and `plugins` — and treats them as equivalent: directories appended to `sys.path`. This is misleading because the `plugins` folder has significantly different module-loading semantics. Unlike the `dags` and `config` folders, which are simply added to `sys.path` (making them searchable for future imports), the `plugins` folder's `.py` files are **actively imported at startup** by `plugins_manager.load_plugins_from_plugin_directory()` ([2.10.3](https://github.com/apache/airflow/blob/2.10.3/airflow/plugins_manager.py#L290-L297), [3.1.8](https://github.com/apache/airflow/blob/3.1.8/airflow-core/src/airflow/plugins_manager.py#L284-L291)) and registered in [`sys.modules`](https://docs.python.org/3/reference/import.html#the-module-cache) under the bare filename. This means they take effect globally and immediately, before any DAG code runs: ```python # airflow/plugins_manager.py (2.10.3, lines 290-297; 3.1.8, lines 284-291) mod_name = path.stem # e.g. "hdfs" for hdfs.py — subdirectory path is ignored loader = importlib.machinery.SourceFileLoader(mod_name, file_path) spec = importlib.util.spec_from_loader(mod_name, loader) mod = importlib.util.module_from_spec(spec) sys.modules[spec.name] = mod # registered as a top-level module loader.exec_module(mod) ``` Since `mod_name` is derived from `path.stem` only, the subdirectory structure is irrelevant. A file at `plugins/my_project/operators/hdfs.py` is registered as `sys.modules["hdfs"]`, not `sys.modules["my_project.operators.hdfs"]`. With `dags`, `config`, or any custom `PYTHONPATH` entry, adding a directory to `sys.path` does not import anything — modules are only loaded when explicitly requested via an `import` statement, and only if the import path matches. With `plugins`, the `sys.modules` cache is populated at process startup, so any subsequent `import` of a module with a matching name — anywhere in the process — will resolve to the plugin file instead of the intended package. **Reproduction:** Place a file named `hdfs.py` (with any content — it does not need to define an `AirflowPlugin` subclass) in a subdirectory of the plugins folder, e.g. `plugins/my_project/hdfs.py`. Create a DAG that imports `WebHDFSHook` from `apache-airflow-providers-apache-hdfs`. Start Airflow. The provider's [`webhdfs.py`](https://github.com/apache/airflow/blob/2.10.3/airflow/providers/apache/hdfs/hooks/webhdfs.py#L27) does `from hdfs import HdfsError, InsecureClient`, which resolves to the custom plugin file instead of the [PyPI `hdfs` library](https://pypi.org/project/hdfs/), producing: ``` ImportError: cannot import name 'HdfsError' from 'hdfs' (/usr/local/airflow/plugins/my_project/hdfs.py) ``` Screenshot of the DAG Import Errors UI: <img width="1474" height="219" alt="Image" src="https://github.com/user-attachments/assets/8cacc8f8-35df-4bec-a19c-06c65e39a3d1" /> The [Plugins page](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/plugins.html) briefly says "The python modules in the plugins folder get imported," but does not explain the `sys.modules` registration consequence or contrast it with how `dags`/`config`/`PYTHONPATH` directories work. **Target audience perspective:** A user reading the Modules Management page would reasonably conclude that placing files in the `plugins` folder has the same import mechanics as placing files in the `dags` folder — just with the added plugin-registration behavior. There is no warning that the `plugins` folder has a significantly broader name-collision surface. The "Best practices for your code naming" section advises using unique top-level package names, but this advice applies only to files placed directly at the root of a `PYTHONPATH` directory. For the `plugins` folder, the risk extends to `.py` files **at any depth** due to the `sys.modules` registration behavior, which the documentation does not mention. ### Solving the problem Two additions to [`modules_management.rst`](https://github.com/apache/airflow/blob/main/airflow-core/docs/administration-and-deployment/modules_management.rst) would address this: **A) In the "Built-in PYTHONPATH entries in Airflow" section:** Add a warning after the bullet listing the `plugins` folder, clarifying that unlike `dags` and `config` (which are only added to `sys.path`), the plugins folder's `.py` files are **actively imported at startup** and registered in `sys.modules` under the bare filename as a top-level module. This means name collisions in the plugins folder affect the entire process globally, not just code that explicitly imports from that directory. A cross-reference to the [Plugins page](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/plugins.html) and to `plugins_manager.load_plugins_from_plugin_directory()` would help users understand the full loading lifecycle. **B) In the "Use unique top package name" section:** Add a warning that this advice is **especially critical** for the plugins folder due to the automatic import behavior. Mention that the filename alone determines the `sys.modules` key (subdirectory structure is ignored), so even deeply nested files can cause top-level collisions. A brief concrete example (e.g., a file at `plugins/my_project/operators/hdfs.py` will be registered as `sys.modules["hdfs"]`, shadowing the PyPI `hdfs` package that `apache-airflow-providers-apache-hdfs` depends on) would make the risk clear. ### Anything else Crucially, if the conflicting file is added while Airflow is already running, no collision occurs until the next restart — making the root cause non-obvious. Scheduler logs showing the collision (reproduced locally with Astro Runtime 12.5.0): ``` [2026-03-13T13:30:00.328+0000] {processor.py:401} WARNING - Error when trying to pre-import module 'airflow.providers.apache.hdfs.hooks.webhdfs' found in /usr/local/airflow/dags/hdfs_sensor_example.py: cannot import name 'HdfsError' from 'hdfs' (/usr/local/airflow/plugins/my_project/hdfs.py) ``` ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
