GitHub user LucasRoesler added a comment to the discussion: Sharepoint ingest
using Microsoft Graph Filesystem
Ok, i have confirmed that direct initialization of the filesystem does not do
this double init and that it actually works as expected, here is my task
definition where I attempt both
```python
import logging
from datetime import datetime
from airflow.providers.common.compat.sdk import BaseHook
from airflow.sdk import DAG, ObjectStoragePath, Variable, task
from msgraphfs import MSGDriveFS
logger = logging.getLogger(__name__)
@task
def list_sharepoint_files() -> list[dict[str, str]]:
"""
List files in SharePoint folder matching the configured pattern.
"""
drive_id = Variable.get("sharepoint_drive_id")
source_folder = Variable.get("sharepoint_source_folder")
file_pattern = Variable.get("sharepoint_file_pattern", default="*")
msgraph_conn_id = "sharepoint"
conn = BaseHook.get_connection(msgraph_conn_id)
fs = MSGDriveFS(
client_id=conn.login,
tenant_id=conn.host,
client_secret=conn.password,
# url_path=f"sharepoint://{drive_id}/{source_folder}/",
)
for p in fs.ls(f"/{drive_id}/{source_folder}/"):
logger.warning(f" - fs://{p}")
logger.warning("Using ObjectStoragePath to list files")
# Build SharePoint source path
source_path = ObjectStoragePath(
f"sharepoint://{msgraph_conn_id}/{drive_id}/{source_folder}/",
conn_id=msgraph_conn_id,
)
logger.info(f"Listing files in SharePoint: {source_path}")
logger.info(f"File pattern: {file_pattern}")
if not source_path.exists():
raise FileNotFoundError(f"SharePoint source folder does not exist:
{source_path}")
matched_files = []
# Recursively search for matching files
for item in source_path.rglob(file_pattern):
if item.is_file():
# Store relative path from source folder
relative_path = item.relative_to(source_path)
matched_files.append(
{
"path": str(relative_path),
"name": item.name,
"full_path": str(item),
}
)
logger.info(f"Found {len(matched_files)} matching files")
for file_info in matched_files:
logger.info(f" - {file_info['path']}")
return matched_files
with DAG(
dag_id="sharepoint_to_blob",
description="Copy files from SharePoint to Azure Blob Storage",
schedule="@daily",
start_date=datetime(2025, 1, 1),
catchup=False,
is_paused_upon_creation=True,
tags=["sharepoint", "data-ingestion"],
) as dag:
files = list_sharepoint_files()
```
And the resulting logs
```text
[2025-11-24, 15:25:36] WARNING - MSGDriveFS drive_id=None
client_id=secret-63e2-4f4d-8270-secret tenant_id=secret-d265-4611-8dbb-secret
client_secret=*** oauth2_client_params=None: source="msgraphfs.core"
[2025-11-24, 15:25:36] WARNING - MSGDriveFS initialized in
multi_site_mode=True: source="msgraphfs.core"
[2025-11-24, 15:25:36] WARNING - MSGDriveFS oauth2_client_params={'client_id':
'secret-63e2-4f4d-8270-secret', 'client_secret': '***', 'token_endpoint':
'https://login.microsoftonline.com/secret-d265-4611-8dbb-secret/oauth2/v2.0/token',
'scope': 'https://graph.microsoft.com/.default', 'grant_type':
'client_credentials'}: source="msgraphfs.core"
[2025-11-24, 15:25:36] WARNING - MSGDriveFS post-super site_name=None
drive_name=None drive_id=None: source="msgraphfs.core"
[2025-11-24, 15:25:36] WARNING - MSGDriveFS drive_id=None
client_id=secret-63e2-4f4d-8270-secret tenant_id=secret-d265-4611-8dbb-secret
client_secret=*** oauth2_client_params=None: source="msgraphfs.core"
[2025-11-24, 15:25:36] WARNING - MSGDriveFS initialized in
multi_site_mode=False: source="msgraphfs.core"
[2025-11-24, 15:25:36] WARNING - MSGDriveFS oauth2_client_params={'client_id':
'secret-63e2-4f4d-8270-secret', 'client_secret': '***', 'token_endpoint':
'https://login.microsoftonline.com/secret-d265-4611-8dbb-secret/oauth2/v2.0/token',
'scope': 'https://graph.microsoft.com/.default', 'grant_type':
'client_credentials'}: source="msgraphfs.core"
[2025-11-24, 15:25:36] WARNING - MSGDriveFS post-super site_name=CTMO GSUS
PRO-89 drive_name=Manual Data drive_id=None: source="msgraphfs.core"
[2025-11-24, 15:25:37] WARNING - - fs://{'name': '/PII Metadata', 'size':
271142, 'type': 'directory', 'addtitional_hidden_fields': '...'}:
source="unusual_prefix_c7e44d46fe65fb6e0814d4788b8cf147a1576372_sharepoint_to_blob_dag"
####
# space added to make it easier to read where direct MSGraphFS ends and where
ObjectStoragePath logging starts
####
[2025-11-24, 15:25:37] WARNING - Using ObjectStoragePath to list files:
source="unusual_prefix_c7e44d46fe65fb6e0814d4788b8cf147a1576372_sharepoint_to_blob_dag"
[2025-11-24, 15:25:37] INFO - Connection: sharepoint,
Connection(conn_id='sharepoint', conn_type='msgraph', description=None,
host='secret-d265-4611-8dbb-secret', schema=None,
login='secret-63e2-4f4d-8270-secret', password='***', port=None, extra='{\n
"scope": [\n "https://graph.microsoft.com/.default"\n ]\n}'):
chan="stdout": source="task"
[2025-11-24, 15:25:37] WARNING - MSGraphFS options: {'client_id':
'secret-63e2-4f4d-8270-secret', 'client_secret': '***', 'tenant_id':
'secret-d265-4611-8dbb-secret', 'scope':
['https://graph.microsoft.com/.default']}:
source="airflow.providers.microsoft.azure.fs.msgraph"
[2025-11-24, 15:25:37] WARNING - OAuth2 client params: {'client_id':
'secret-63e2-4f4d-8270-secret', 'client_secret': '***', 'tenant_id':
'secret-d265-4611-8dbb-secret', 'scope':
['https://graph.microsoft.com/.default']}:
source="airflow.providers.microsoft.azure.fs.msgraph"
[2025-11-24, 15:25:37] WARNING - MSGDriveFS drive_id=None client_id=None
tenant_id=None client_secret=None oauth2_client_params={'client_id':
'secret-63e2-4f4d-8270-secret', 'client_secret': '***', 'tenant_id':
'secret-d265-4611-8dbb-secret', 'scope':
['https://graph.microsoft.com/.default']}: source="msgraphfs.core"
[2025-11-24, 15:25:37] WARNING - MSGDriveFS initialized in
multi_site_mode=True: source="msgraphfs.core"
[2025-11-24, 15:25:37] WARNING - MSGDriveFS oauth2_client_params={'client_id':
'secret-63e2-4f4d-8270-secret', 'client_secret': '***', 'tenant_id':
'secret-d265-4611-8dbb-secret', 'scope':
['https://graph.microsoft.com/.default']}: source="msgraphfs.core"
[2025-11-24, 15:25:37] WARNING - MSGDriveFS post-super site_name=None
drive_name=None drive_id=None: source="msgraphfs.core"
[2025-11-24, 15:25:37] WARNING - MSGDriveFS drive_id=None
client_id=secret-63e2-4f4d-8270-secret tenant_id=None client_secret=***
oauth2_client_params=None: source="msgraphfs.core"
[2025-11-24, 15:25:37] WARNING - MSGDriveFS initialized in
multi_site_mode=False: source="msgraphfs.core"
[2025-11-24, 15:25:37] ERROR - Task failed with exception: source="task"
ValueError: Either oauth2_client_params must be provided, or all of client_id,
tenant_id, and client_secret must be provided (either as parameters or
environment variables MSGRAPHFS_CLIENT_ID/AZURE_CLIENT_ID,
MSGRAPHFS_TENANT_ID/AZURE_TENANT_ID,
MSGRAPHFS_CLIENT_SECRET/AZURE_CLIENT_SECRET)
```
GitHub link:
https://github.com/apache/airflow/discussions/58221#discussioncomment-15063359
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]