GitHub user LucasRoesler added a comment to the discussion: Sharepoint ingest 
using Microsoft Graph Filesystem

Sadly, I don't think my PR #58568  is enough to make this functional. When I 
use the patch locally, it can find the filesystem but I get errors during the 
initialization. 

1. I _must_ pass the `conn_id` parameter to `ObjectStoragePath`. The 
[documentation 
here](https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/filesystems/msgraph.html)
 suggests that it will be inferred from the file URI, however, I was able to 
definitely show that this doesn't work by adding a log to the start of the 
`get_fs` method in the provider
    ```python 
    def get_fs(conn_id: str | None, storage_options: dict[str, Any] | None = 
None) -> AbstractFileSystem:
        from msgraphfs import MSGDriveFS
    
        if conn_id is None:
            logger.warning("No connection ID provided, using default MSGDriveFS 
loading from MSGRAPHFS env variables.")
            return MSGDriveFS({})
    ```
    This produces errors that look like this
        ```python
        [2025-11-24, 14:12:14] WARNING - No connection ID provided, using 
default MSGDriveFS loading from MSGRAPHFS env variables.: 
source="airflow.providers.microsoft.azure.fs.msgraph"
        [2025-11-24, 14:12:14] ERROR - Task failed with exception: source="task"
        ValueError: Either oauth2_client_params must be provided, or all of 
client_id, tenant_id, and client_secret must be provided (either as parameters 
or environment variables MSGRAPHFS_CLIENT_ID/AZURE_CLIENT_ID, 
MSGRAPHFS_TENANT_ID/AZURE_TENANT_ID, 
MSGRAPHFS_CLIENT_SECRET/AZURE_CLIENT_SECRET)

        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py",
 line 920 in run
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py",
 line 1215 in _execute_task
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/bases/operator.py",
 line 397 in wrapper
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/bases/decorator.py",
 line 251 in execute
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/bases/operator.py",
 line 397 in wrapper
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/providers/standard/operators/python.py",
 line 216 in execute
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/providers/standard/operators/python.py",
 line 239 in execute_callable
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/execution_time/callback_runner.py",
 line 81 in run
        File 
"/home/lucas/code/cool-project/data_platform/dags/examples/sharepoint_to_blob_dag.py",
 line 32 in list_sharepoint_files
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/upath/core.py",
 line 1442 in exists
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/upath/core.py",
 line 300 in fs
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/io/path.py",
 line 110 in _fs_factory
        File 
"/home/lucas/.local/share/uv/python/cpython-3.12.5-linux-x86_64-gnu/lib/python3.12/functools.py",
 line 993 in __get__
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/io/store.py",
 line 63 in fs
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/io/__init__.py",
 line 108 in get_fs
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/providers/microsoft/azure/fs/msgraph.py",
 line 39 in get_fs
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/fsspec/spec.py",
 line 84 in __call__
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/msgraphfs/core.py",
 line 1443 in __init__
        AttributeError: 'ObjectStoragePath' object has no attribute '_fs_cached'
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/upath/core.py",
 line 298 in fs
        ```

      The documentation is _very_ misleading and either it should be updated to 
show that `conn_id` is _always_ required **or** the code should be updated to 
do the implied extraction when `conn_id` _is_ None **or** the documentation 
should be more explicit that not passing a `conn_id` will cause it to inspect 
the environment variables and document those variables and link back to the 
original documentation for more details. 

2. Once you pass the `conn_id`, you continue to get similar errors, this lead 
me to add another debugging log line, to verify that a connection is found and 
the content of that connection:

        ```python
        def get_fs(conn_id: str | None, storage_options: dict[str, Any] | None 
= None) -> AbstractFileSystem:
                from msgraphfs import MSGDriveFS

                if conn_id is None:
                        logger.warning("No connection ID provided, using 
default MSGDriveFS loading from MSGRAPHFS env variables.")
                        return MSGDriveFS({})

                conn = BaseHook.get_connection(conn_id)
                extras = conn.extra_dejson
                conn_type = conn.conn_type or "msgraph"

                logger.info(f"Connection: {conn_id}, {conn}")
        ```

        Which produces logs like this 

        ```python
        [2025-11-24, 14:21:18] INFO - Connection: sharepoint, 
Connection(conn_id='sharepoint', conn_type='msgraph', description=None, 
host='secret-d265-4611-8dbb-secret', schema=None, 
login='secret-63e2-4f4d-8270-secret', password='***', port=None, extra='{\n  
"scope": [\n    "https://graph.microsoft.com/.default"\n  ]\n}'): 
chan="stdout": source="task"
        [2025-11-24, 14:21:18] ERROR - Task failed with exception: source="task"
        ValueError: Either oauth2_client_params must be provided, or all of 
client_id, tenant_id, and client_secret must be provided (either as parameters 
or environment variables MSGRAPHFS_CLIENT_ID/AZURE_CLIENT_ID, 
MSGRAPHFS_TENANT_ID/AZURE_TENANT_ID, 
MSGRAPHFS_CLIENT_SECRET/AZURE_CLIENT_SECRET)

        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py",
 line 920 in run
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py",
 line 1215 in _execute_task
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/bases/operator.py",
 line 397 in wrapper
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/bases/decorator.py",
 line 251 in execute
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/bases/operator.py",
 line 397 in wrapper
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/providers/standard/operators/python.py",
 line 216 in execute
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/providers/standard/operators/python.py",
 line 239 in execute_callable
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/execution_time/callback_runner.py",
 line 81 in run
        File 
"/home/lucas/code/cool-project/data_platform/dags/examples/sharepoint_to_blob_dag.py",
 line 33 in list_sharepoint_files
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/upath/core.py",
 line 1442 in exists
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/fsspec/asyn.py",
 line 118 in wrapper
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/fsspec/asyn.py",
 line 103 in sync
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/fsspec/asyn.py",
 line 56 in _runner
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/msgraphfs/core.py",
 line 1833 in _exists
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/msgraphfs/core.py",
 line 1583 in _get_drive_fs
        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/fsspec/spec.py",
 line 84 in __call__

        File 
"/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/msgraphfs/core.py",
 line 1443 in __init__
        ```

    The rest of the code in `get_fs` looks like various attempt to extract 
additional configuration from the connection object. So, I also add this 
debugging to the _end_ of the `get_fs` method

        ```python
        logger.warning(f"MSGraphFS options: {options}")
    logger.warning(f"OAuth2 client params: {oauth2_client_params}")

    return MSGDriveFS(drive_id=drive_id, 
oauth2_client_params=oauth2_client_params)
        ```

        Resulting in logs like this 

        ```text
        [2025-11-24, 14:30:16] WARNING - MSGraphFS options: {'client_id': 
'secret-63e2-4f4d-8270-secret', 'client_secret': '***', 'tenant_id': 
'secret-d265-4611-8dbb-secret', 'scope': 
['https://graph.microsoft.com/.default']}: 
source="airflow.providers.microsoft.azure.fs.msgraph"
        [2025-11-24, 14:30:16] WARNING - OAuth2 client params: {'client_id': 
'secret-63e2-4f4d-8270-secret', 'client_secret': '***', 'tenant_id': 
'secret-d265-4611-8dbb-secret', 'scope': 
['https://graph.microsoft.com/.default']}: 
source="airflow.providers.microsoft.azure.fs.msgraph"
        [2025-11-24, 14:30:16] ERROR - Task failed with exception: source="task"
        ValueError: Either oauth2_client_params must be provided, or all of 
client_id, tenant_id, and client_secret must be provided (either as parameters 
or environment variables MSGRAPHFS_CLIENT_ID/AZURE_CLIENT_ID, 
MSGRAPHFS_TENANT_ID/AZURE_TENANT_ID, 
MSGRAPHFS_CLIENT_SECRET/AZURE_CLIENT_SECRET)
        # ... truncated for brevity ...
        ```

3. Now that we are sure that it is (a) finding a connection and (b) extracting 
parameters from it, the next step is digging into the init chain for 
`MSGDriveFS`. When I go directly to the line the error is coming from in 
`msgraphfs/core.py`, I found this block
        ```python
        if oauth2_client_params is None:
                if not all([self.client_id, self.tenant_id, 
self.client_secret]):
                        raise ValueError(
                                "Either oauth2_client_params must be provided, 
or all of "
                                "client_id, tenant_id, and client_secret must 
be provided "
                                "(either as parameters or environment variables 
MSGRAPHFS_CLIENT_ID/"
                                "AZURE_CLIENT_ID, 
MSGRAPHFS_TENANT_ID/AZURE_TENANT_ID, "
                                "MSGRAPHFS_CLIENT_SECRET/AZURE_CLIENT_SECRET)"
                        )
        ```

        This is bizzare because my logging above shows that we are providing a 
non-`None` value for `oauth2_client_params`.


At this point, I am a bit stuck. I might just have to use the `msgraphfs` 
library directly in my DAGs instead of going through the Airflow provider. But 
maybe you have some ideas?

GitHub link: 
https://github.com/apache/airflow/discussions/58221#discussioncomment-15062881

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to