thedavidnovak opened a new issue, #63548:
URL: https://github.com/apache/airflow/issues/63548

   ### What do you see as an issue?
   
   The [Modules 
Management](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/modules_management.html#modules-management)
 documentation describes three "Built-in PYTHONPATH entries in Airflow" — 
`dags`, `config`, and `plugins` — and treats them as equivalent: directories 
appended to `sys.path`. This is misleading because the `plugins` folder has 
significantly different module-loading semantics.
   
   Unlike the `dags` and `config` folders, which are simply added to `sys.path` 
(making them searchable for future imports), the `plugins` folder's `.py` files 
are **actively imported at startup** by 
`plugins_manager.load_plugins_from_plugin_directory()` 
([2.10.3](https://github.com/apache/airflow/blob/2.10.3/airflow/plugins_manager.py#L290-L297),
 
[3.1.8](https://github.com/apache/airflow/blob/3.1.8/airflow-core/src/airflow/plugins_manager.py#L284-L291))
 and registered in 
[`sys.modules`](https://docs.python.org/3/reference/import.html#the-module-cache)
 under the bare filename. This means they take effect globally and immediately, 
before any DAG code runs:
   
   ```python
   # airflow/plugins_manager.py (2.10.3, lines 290-297; 3.1.8, lines 284-291)
   mod_name = path.stem  # e.g. "hdfs" for hdfs.py — subdirectory path is 
ignored
   loader = importlib.machinery.SourceFileLoader(mod_name, file_path)
   spec = importlib.util.spec_from_loader(mod_name, loader)
   mod = importlib.util.module_from_spec(spec)
   sys.modules[spec.name] = mod  # registered as a top-level module
   loader.exec_module(mod)
   ```
   
   Since `mod_name` is derived from `path.stem` only, the subdirectory 
structure is irrelevant. A file at `plugins/my_project/operators/hdfs.py` is 
registered as `sys.modules["hdfs"]`, not 
`sys.modules["my_project.operators.hdfs"]`.
   
   With `dags`, `config`, or any custom `PYTHONPATH` entry, adding a directory 
to `sys.path` does not import anything — modules are only loaded when 
explicitly requested via an `import` statement, and only if the import path 
matches. With `plugins`, the `sys.modules` cache is populated at process 
startup, so any subsequent `import` of a module with a matching name — anywhere 
in the process — will resolve to the plugin file instead of the intended 
package.
   
   **Reproduction:** Place a file named `hdfs.py` (with any content — it does 
not need to define an `AirflowPlugin` subclass) in a subdirectory of the 
plugins folder, e.g. `plugins/my_project/hdfs.py`. Create a DAG that imports 
`WebHDFSHook` from `apache-airflow-providers-apache-hdfs`. Start Airflow. The 
provider's 
[`webhdfs.py`](https://github.com/apache/airflow/blob/2.10.3/airflow/providers/apache/hdfs/hooks/webhdfs.py#L27)
 does `from hdfs import HdfsError, InsecureClient`, which resolves to the 
custom plugin file instead of the [PyPI `hdfs` 
library](https://pypi.org/project/hdfs/), producing:
   
   ```
   ImportError: cannot import name 'HdfsError' from 'hdfs' 
(/usr/local/airflow/plugins/my_project/hdfs.py)
   ```
   
   Screenshot of the DAG Import Errors UI:
   
   <img width="1474" height="219" alt="Image" 
src="https://github.com/user-attachments/assets/8cacc8f8-35df-4bec-a19c-06c65e39a3d1";
 />
   
   The [Plugins 
page](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/plugins.html)
 briefly says "The python modules in the plugins folder get imported," but does 
not explain the `sys.modules` registration consequence or contrast it with how 
`dags`/`config`/`PYTHONPATH` directories work.
   
   **Target audience perspective:** A user reading the Modules Management page 
would reasonably conclude that placing files in the `plugins` folder has the 
same import mechanics as placing files in the `dags` folder — just with the 
added plugin-registration behavior. There is no warning that the `plugins` 
folder has a significantly broader name-collision surface. The "Best practices 
for your code naming" section advises using unique top-level package names, but 
this advice applies only to files placed directly at the root of a `PYTHONPATH` 
directory. For the `plugins` folder, the risk extends to `.py` files **at any 
depth** due to the `sys.modules` registration behavior, which the documentation 
does not mention.
   
   ### Solving the problem
   
   Two additions to 
[`modules_management.rst`](https://github.com/apache/airflow/blob/main/airflow-core/docs/administration-and-deployment/modules_management.rst)
 would address this:
   
   **A) In the "Built-in PYTHONPATH entries in Airflow" section:**
   
   Add a warning after the bullet listing the `plugins` folder, clarifying that 
unlike `dags` and `config` (which are only added to `sys.path`), the plugins 
folder's `.py` files are **actively imported at startup** and registered in 
`sys.modules` under the bare filename as a top-level module. This means name 
collisions in the plugins folder affect the entire process globally, not just 
code that explicitly imports from that directory. A cross-reference to the 
[Plugins 
page](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/plugins.html)
 and to `plugins_manager.load_plugins_from_plugin_directory()` would help users 
understand the full loading lifecycle.
   
   **B) In the "Use unique top package name" section:**
   
   Add a warning that this advice is **especially critical** for the plugins 
folder due to the automatic import behavior. Mention that the filename alone 
determines the `sys.modules` key (subdirectory structure is ignored), so even 
deeply nested files can cause top-level collisions. A brief concrete example 
(e.g., a file at `plugins/my_project/operators/hdfs.py` will be registered as 
`sys.modules["hdfs"]`, shadowing the PyPI `hdfs` package that 
`apache-airflow-providers-apache-hdfs` depends on) would make the risk clear.
   
   ### Anything else
   
   Crucially, if the conflicting file is added while Airflow is already 
running, no collision occurs until the next restart — making the root cause 
non-obvious.
   
   Scheduler logs showing the collision (reproduced locally with Astro Runtime 
12.5.0):
   
   ```
   [2026-03-13T13:30:00.328+0000] {processor.py:401} WARNING - Error when 
trying to pre-import module
     'airflow.providers.apache.hdfs.hooks.webhdfs' found in 
/usr/local/airflow/dags/hdfs_sensor_example.py:
     cannot import name 'HdfsError' from 'hdfs' 
(/usr/local/airflow/plugins/my_project/hdfs.py)
   ```
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to