GitHub user LucasRoesler added a comment to the discussion: Sharepoint ingest using Microsoft Graph Filesystem
Sadly, I don't think my PR #58568 is enough to make this functional. When I use the patch locally, it can find the filesystem but I get errors during the initialization. 1. I _must_ pass the `conn_id` parameter to `ObjectStoragePath`. The [documentation here](https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/filesystems/msgraph.html) suggests that it will be inferred from the file URI, however, I was able to definitely show that this doesn't work by adding a log to the start of the `get_fs` method in the provider ```python def get_fs(conn_id: str | None, storage_options: dict[str, Any] | None = None) -> AbstractFileSystem: from msgraphfs import MSGDriveFS if conn_id is None: logger.warning("No connection ID provided, using default MSGDriveFS loading from MSGRAPHFS env variables.") return MSGDriveFS({}) ``` This produces errors that look like this ```python [2025-11-24, 14:12:14] WARNING - No connection ID provided, using default MSGDriveFS loading from MSGRAPHFS env variables.: source="airflow.providers.microsoft.azure.fs.msgraph" [2025-11-24, 14:12:14] ERROR - Task failed with exception: source="task" ValueError: Either oauth2_client_params must be provided, or all of client_id, tenant_id, and client_secret must be provided (either as parameters or environment variables MSGRAPHFS_CLIENT_ID/AZURE_CLIENT_ID, MSGRAPHFS_TENANT_ID/AZURE_TENANT_ID, MSGRAPHFS_CLIENT_SECRET/AZURE_CLIENT_SECRET) File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py", line 920 in run File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py", line 1215 in _execute_task File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/bases/operator.py", line 397 in wrapper File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/bases/decorator.py", line 251 in execute File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/bases/operator.py", line 397 in wrapper File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/providers/standard/operators/python.py", line 216 in execute File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/providers/standard/operators/python.py", line 239 in execute_callable File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/execution_time/callback_runner.py", line 81 in run File "/home/lucas/code/cool-project/data_platform/dags/examples/sharepoint_to_blob_dag.py", line 32 in list_sharepoint_files File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/upath/core.py", line 1442 in exists File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/upath/core.py", line 300 in fs File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/io/path.py", line 110 in _fs_factory File "/home/lucas/.local/share/uv/python/cpython-3.12.5-linux-x86_64-gnu/lib/python3.12/functools.py", line 993 in __get__ File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/io/store.py", line 63 in fs File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/io/__init__.py", line 108 in get_fs File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/providers/microsoft/azure/fs/msgraph.py", line 39 in get_fs File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/fsspec/spec.py", line 84 in __call__ File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/msgraphfs/core.py", line 1443 in __init__ AttributeError: 'ObjectStoragePath' object has no attribute '_fs_cached' File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/upath/core.py", line 298 in fs ``` The documentation is _very_ misleading and either it should be updated to show that `conn_id` is _always_ required **or** the code should be updated to do the implied extraction when `conn_id` _is_ None **or** the documentation should be more explicit that not passing a `conn_id` will cause it to inspect the environment variables and document those variables and link back to the original documentation for more details. 2. Once you pass the `conn_id`, you continue to get similar errors, this lead me to add another debugging log line, to verify that a connection is found and the content of that connection: ```python def get_fs(conn_id: str | None, storage_options: dict[str, Any] | None = None) -> AbstractFileSystem: from msgraphfs import MSGDriveFS if conn_id is None: logger.warning("No connection ID provided, using default MSGDriveFS loading from MSGRAPHFS env variables.") return MSGDriveFS({}) conn = BaseHook.get_connection(conn_id) extras = conn.extra_dejson conn_type = conn.conn_type or "msgraph" logger.info(f"Connection: {conn_id}, {conn}") ``` Which produces logs like this ```python [2025-11-24, 14:21:18] INFO - Connection: sharepoint, Connection(conn_id='sharepoint', conn_type='msgraph', description=None, host='secret-d265-4611-8dbb-secret', schema=None, login='secret-63e2-4f4d-8270-secret', password='***', port=None, extra='{\n "scope": [\n "https://graph.microsoft.com/.default"\n ]\n}'): chan="stdout": source="task" [2025-11-24, 14:21:18] ERROR - Task failed with exception: source="task" ValueError: Either oauth2_client_params must be provided, or all of client_id, tenant_id, and client_secret must be provided (either as parameters or environment variables MSGRAPHFS_CLIENT_ID/AZURE_CLIENT_ID, MSGRAPHFS_TENANT_ID/AZURE_TENANT_ID, MSGRAPHFS_CLIENT_SECRET/AZURE_CLIENT_SECRET) File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py", line 920 in run File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py", line 1215 in _execute_task File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/bases/operator.py", line 397 in wrapper File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/bases/decorator.py", line 251 in execute File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/bases/operator.py", line 397 in wrapper File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/providers/standard/operators/python.py", line 216 in execute File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/providers/standard/operators/python.py", line 239 in execute_callable File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/airflow/sdk/execution_time/callback_runner.py", line 81 in run File "/home/lucas/code/cool-project/data_platform/dags/examples/sharepoint_to_blob_dag.py", line 33 in list_sharepoint_files File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/upath/core.py", line 1442 in exists File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/fsspec/asyn.py", line 118 in wrapper File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/fsspec/asyn.py", line 103 in sync File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/fsspec/asyn.py", line 56 in _runner File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/msgraphfs/core.py", line 1833 in _exists File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/msgraphfs/core.py", line 1583 in _get_drive_fs File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/fsspec/spec.py", line 84 in __call__ File "/home/lucas/code/cool-project/.venv/lib/python3.12/site-packages/msgraphfs/core.py", line 1443 in __init__ ``` The rest of the code in `get_fs` looks like various attempt to extract additional configuration from the connection object. So, I also add this debugging to the _end_ of the `get_fs` method ```python logger.warning(f"MSGraphFS options: {options}") logger.warning(f"OAuth2 client params: {oauth2_client_params}") return MSGDriveFS(drive_id=drive_id, oauth2_client_params=oauth2_client_params) ``` Resulting in logs like this ```text [2025-11-24, 14:30:16] WARNING - MSGraphFS options: {'client_id': 'secret-63e2-4f4d-8270-secret', 'client_secret': '***', 'tenant_id': 'secret-d265-4611-8dbb-secret', 'scope': ['https://graph.microsoft.com/.default']}: source="airflow.providers.microsoft.azure.fs.msgraph" [2025-11-24, 14:30:16] WARNING - OAuth2 client params: {'client_id': 'secret-63e2-4f4d-8270-secret', 'client_secret': '***', 'tenant_id': 'secret-d265-4611-8dbb-secret', 'scope': ['https://graph.microsoft.com/.default']}: source="airflow.providers.microsoft.azure.fs.msgraph" [2025-11-24, 14:30:16] ERROR - Task failed with exception: source="task" ValueError: Either oauth2_client_params must be provided, or all of client_id, tenant_id, and client_secret must be provided (either as parameters or environment variables MSGRAPHFS_CLIENT_ID/AZURE_CLIENT_ID, MSGRAPHFS_TENANT_ID/AZURE_TENANT_ID, MSGRAPHFS_CLIENT_SECRET/AZURE_CLIENT_SECRET) # ... truncated for brevity ... ``` 3. Now that we are sure that it is (a) finding a connection and (b) extracting parameters from it, the next step is digging into the init chain for `MSGDriveFS`. When I go directly to the line the error is coming from in `msgraphfs/core.py`, I found this block ```python if oauth2_client_params is None: if not all([self.client_id, self.tenant_id, self.client_secret]): raise ValueError( "Either oauth2_client_params must be provided, or all of " "client_id, tenant_id, and client_secret must be provided " "(either as parameters or environment variables MSGRAPHFS_CLIENT_ID/" "AZURE_CLIENT_ID, MSGRAPHFS_TENANT_ID/AZURE_TENANT_ID, " "MSGRAPHFS_CLIENT_SECRET/AZURE_CLIENT_SECRET)" ) ``` This is bizzare because my logging above shows that we are providing a non-`None` value for `oauth2_client_params`. At this point, I am a bit stuck. I might just have to use the `msgraphfs` library directly in my DAGs instead of going through the Airflow provider. But maybe you have some ideas? GitHub link: https://github.com/apache/airflow/discussions/58221#discussioncomment-15062881 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
