GitHub user LucasRoesler added a comment to the discussion: Sharepoint ingest 
using Microsoft Graph Filesystem

I have done a bit of playing and have learned the following things 

1. You can get the list of known filesystems from fsspec using 

    ```python
    import fsspec 
    print(f"Available protocols: {fsspecavailable_protocols()}")
    ```
  
    It doesn't show the expected `msgd` protocol. You can even try to 
reregister the filesystem and it doesn't help. This led me to my next insight 
(which is obvious as soon as I reread the error)

3. I realized the error isn't even come from `fsspec`, even though it _is being 
used_ internally. Rather, it comes from `airflow.io.get_fs` Airlfow has it's 
_own_ method for Providers to register filesystems! It isn't enough to install 
the relevant implementation for `fsspec` and even if you call 
`register_implementation`. Instead, the provider must provider a list of 
`filesystems` in the ProviderInfo.data. The value of `filesystems` must be a 
module with a `get_fs` function. It also seems like it needs to include a 
`schemes` list. These are then combined to create the required mapping in 
`airflow.io.get_fs`.

    BUT, all of this seems to be done: 
    - the [provider info has a 
"filesystems"](https://github.com/apache/airflow/blob/5326d9444fd1bda4b98b264b48360d5c437e017b/providers/microsoft/azure/src/airflow/providers/microsoft/azure/get_provider_info.py#L201)
 , and 
    - the filesystem for the [msgraph has both `schemes` and 
`get_fs`](https://github.com/apache/airflow/blob/main/providers/microsoft/azure/src/airflow/providers/microsoft/azure/fs/msgraph.py)
 
    
    I don't think it is an issue with my installation because I found all of 
this in my local venv as well. 

Those two items led me to trying to inspect the available filesystems directly, 
where i was able to notice this warning message 

```python 
>>> from airflow import io
/home/lucas/code/telekom/gsus/pro-89/modeling/.venv/lib/python3.12/site-packages/airflow/configuration.py:859
 DeprecationWarning: The secret_key option in [webserver] has been moved to the 
secret_key option in [api] - the old setting has been used, but please update 
your config.
>>> io._register_filesystems()
[2025-11-21T18:36:11.870+0100] {configuration.py:1077} WARNING - section/key 
[openlineage/namespace] not found in config
[2025-11-21T18:36:12.461+0100] {local.py:145} WARNING - To enable emitting 
Openlineage events, upgrade to Airflow 2.7 or install 
astronomer-cosmos[openlineage].
[2025-11-21T18:36:13.029+0100] {providers_manager.py:266} INFO - Optional 
provider feature disabled when importing 
'airflow.providers.microsoft.azure.fs.msgraphfs.get_fs' from 
'apache-airflow-providers-microsoft-azure' package
{'file': <function _file at 0x7f3f339f0a40>, 'local': <function _file at 
0x7f3f339f0a40>, 'abfs': <function get_fs at 0x7f3f2eec3ce0>, 'abfss': 
<function get_fs at 0x7f3f2eec3ce0>, 'adl': <function get_fs at 0x7f3f2eec3ce0>}
```

So I turned on debug logging and got some more details 

```python 
[2025-11-21T18:38:54.336+0100] {providers_manager.py:359} DEBUG - 
Initialization of Providers Manager[list] took 1.20 seconds
[2025-11-21T18:38:54.337+0100] {providers_manager.py:260} DEBUG - Optional 
feature disabled on exception when importing 
'airflow.providers.microsoft.azure.fs.msgraphfs.get_fs' from 
'apache-airflow-providers-microsoft-azure' package
Traceback (most recent call last):
  File 
"/home/lucas/code/telekom/gsus/pro-89/modeling/.venv/lib/python3.12/site-packages/airflow/providers_manager.py",
 line 306, in _correctness_check
    imported_class = import_string(class_name)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/lucas/code/telekom/gsus/pro-89/modeling/.venv/lib/python3.12/site-packages/airflow/utils/module_loading.py",
 line 39, in import_string
    module = import_module(module_path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/home/lucas/.local/share/uv/python/cpython-3.12.5-linux-x86_64-gnu/lib/python3.12/importlib/__init__.py",
 line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1324, in _find_and_load_unlocked
ModuleNotFoundError: No module named 
'airflow.providers.microsoft.azure.fs.msgraphfs'
[2025-11-21T18:38:54.337+0100] {providers_manager.py:266} INFO - Optional 
provider feature disabled when importing 
'airflow.providers.microsoft.azure.fs.msgraphfs.get_fs' from 
'apache-airflow-providers-microsoft-azure' package
[2025-11-21T18:38:54.337+0100] {providers_manager.py:359} DEBUG - 
Initialization of Providers Manager[filesystems] took 1.20 seconds
```


`ModuleNotFoundError`, this is strange because I _thought_ I was staring 
directly at the module. But there is a 
typo!AIRFLOW__LOGGING__LOGGING_LEVEL="DEBUG"

This line in the [provider info](
https://github.com/apache/airflow/blob/5326d9444fd1bda4b98b264b48360d5c437e017b/providers/microsoft/azure/src/airflow/providers/microsoft/azure/get_provider_info.py#L203)

```python
        "filesystems": [
            "airflow.providers.microsoft.azure.fs.adls",
            "airflow.providers.microsoft.azure.fs.msgraphfs",
        ],
```
should be 
```python 
        "filesystems": [
            "airflow.providers.microsoft.azure.fs.adls",
            "airflow.providers.microsoft.azure.fs.msgraph",
        ],
```

This discuss should be converted to a bug ticket and _if_ it is ok, i will 
submit a patch. 

GitHub link: 
https://github.com/apache/airflow/discussions/58221#discussioncomment-15040696

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to