GitHub user LucasRoesler added a comment to the discussion: Sharepoint ingest 
using Microsoft Graph Filesystem

Ok, i have confirmed that direct initialization of the filesystem does not do 
this double init  and that it actually works as expected, here is my task 
definition where I attempt both 


```python
import logging
from datetime import datetime

from airflow.providers.common.compat.sdk import BaseHook
from airflow.sdk import DAG, ObjectStoragePath, Variable, task
from msgraphfs import MSGDriveFS

logger = logging.getLogger(__name__)


@task
def list_sharepoint_files() -> list[dict[str, str]]:
    """
    List files in SharePoint folder matching the configured pattern.
    """
    drive_id = Variable.get("sharepoint_drive_id")
    source_folder = Variable.get("sharepoint_source_folder")
    file_pattern = Variable.get("sharepoint_file_pattern", default="*")
    msgraph_conn_id = "sharepoint"

    conn = BaseHook.get_connection(msgraph_conn_id)

    fs = MSGDriveFS(
        client_id=conn.login,
        tenant_id=conn.host,
        client_secret=conn.password,
        # url_path=f"sharepoint://{drive_id}/{source_folder}/",
    )

    for p in fs.ls(f"/{drive_id}/{source_folder}/"):
        logger.warning(f"  - fs://{p}")

    logger.warning("Using ObjectStoragePath to list files")
    # Build SharePoint source path
    source_path = ObjectStoragePath(
        f"sharepoint://{msgraph_conn_id}/{drive_id}/{source_folder}/",
        conn_id=msgraph_conn_id,
    )

    logger.info(f"Listing files in SharePoint: {source_path}")
    logger.info(f"File pattern: {file_pattern}")

    if not source_path.exists():
        raise FileNotFoundError(f"SharePoint source folder does not exist: 
{source_path}")

    matched_files = []

    # Recursively search for matching files
    for item in source_path.rglob(file_pattern):
        if item.is_file():
            # Store relative path from source folder
            relative_path = item.relative_to(source_path)
            matched_files.append(
                {
                    "path": str(relative_path),
                    "name": item.name,
                    "full_path": str(item),
                }
            )

    logger.info(f"Found {len(matched_files)} matching files")
    for file_info in matched_files:
        logger.info(f"  - {file_info['path']}")

    return matched_files


with DAG(
    dag_id="sharepoint_to_blob",
    description="Copy files from SharePoint to Azure Blob Storage",
    schedule="@daily",
    start_date=datetime(2025, 1, 1),
    catchup=False,
    is_paused_upon_creation=True,
    tags=["sharepoint", "data-ingestion"],
) as dag:
    files = list_sharepoint_files()
```

And the resulting logs

```text
[2025-11-24, 15:25:36] WARNING - MSGDriveFS drive_id=None 
client_id=secret-63e2-4f4d-8270-secret tenant_id=secret-d265-4611-8dbb-secret 
client_secret=*** oauth2_client_params=None: source="msgraphfs.core"
[2025-11-24, 15:25:36] WARNING - MSGDriveFS initialized in 
multi_site_mode=True: source="msgraphfs.core"
[2025-11-24, 15:25:36] WARNING - MSGDriveFS oauth2_client_params={'client_id': 
'secret-63e2-4f4d-8270-secret', 'client_secret': '***', 'token_endpoint': 
'https://login.microsoftonline.com/secret-d265-4611-8dbb-secret/oauth2/v2.0/token',
 'scope': 'https://graph.microsoft.com/.default', 'grant_type': 
'client_credentials'}: source="msgraphfs.core"
[2025-11-24, 15:25:36] WARNING - MSGDriveFS post-super site_name=None 
drive_name=None drive_id=None: source="msgraphfs.core"
[2025-11-24, 15:25:36] WARNING - MSGDriveFS drive_id=None 
client_id=secret-63e2-4f4d-8270-secret tenant_id=secret-d265-4611-8dbb-secret 
client_secret=*** oauth2_client_params=None: source="msgraphfs.core"
[2025-11-24, 15:25:36] WARNING - MSGDriveFS initialized in 
multi_site_mode=False: source="msgraphfs.core"
[2025-11-24, 15:25:36] WARNING - MSGDriveFS oauth2_client_params={'client_id': 
'secret-63e2-4f4d-8270-secret', 'client_secret': '***', 'token_endpoint': 
'https://login.microsoftonline.com/secret-d265-4611-8dbb-secret/oauth2/v2.0/token',
 'scope': 'https://graph.microsoft.com/.default', 'grant_type': 
'client_credentials'}: source="msgraphfs.core"
[2025-11-24, 15:25:36] WARNING - MSGDriveFS post-super site_name=CTMO GSUS 
PRO-89 drive_name=Manual Data drive_id=None: source="msgraphfs.core"
[2025-11-24, 15:25:37] WARNING -   - fs://{'name': '/PII Metadata', 'size': 
271142, 'type': 'directory', 'addtitional_hidden_fields': '...'}: 
source="unusual_prefix_c7e44d46fe65fb6e0814d4788b8cf147a1576372_sharepoint_to_blob_dag"

####
# space added to make it easier to read where direct MSGraphFS ends and where 
ObjectStoragePath logging starts 
####

[2025-11-24, 15:25:37] WARNING - Using ObjectStoragePath to list files: 
source="unusual_prefix_c7e44d46fe65fb6e0814d4788b8cf147a1576372_sharepoint_to_blob_dag"
[2025-11-24, 15:25:37] INFO - Connection: sharepoint, 
Connection(conn_id='sharepoint', conn_type='msgraph', description=None, 
host='secret-d265-4611-8dbb-secret', schema=None, 
login='secret-63e2-4f4d-8270-secret', password='***', port=None, extra='{\n  
"scope": [\n    "https://graph.microsoft.com/.default"\n  ]\n}'): 
chan="stdout": source="task"
[2025-11-24, 15:25:37] WARNING - MSGraphFS options: {'client_id': 
'secret-63e2-4f4d-8270-secret', 'client_secret': '***', 'tenant_id': 
'secret-d265-4611-8dbb-secret', 'scope': 
['https://graph.microsoft.com/.default']}: 
source="airflow.providers.microsoft.azure.fs.msgraph"
[2025-11-24, 15:25:37] WARNING - OAuth2 client params: {'client_id': 
'secret-63e2-4f4d-8270-secret', 'client_secret': '***', 'tenant_id': 
'secret-d265-4611-8dbb-secret', 'scope': 
['https://graph.microsoft.com/.default']}: 
source="airflow.providers.microsoft.azure.fs.msgraph"
[2025-11-24, 15:25:37] WARNING - MSGDriveFS drive_id=None client_id=None 
tenant_id=None client_secret=None oauth2_client_params={'client_id': 
'secret-63e2-4f4d-8270-secret', 'client_secret': '***', 'tenant_id': 
'secret-d265-4611-8dbb-secret', 'scope': 
['https://graph.microsoft.com/.default']}: source="msgraphfs.core"
[2025-11-24, 15:25:37] WARNING - MSGDriveFS initialized in 
multi_site_mode=True: source="msgraphfs.core"
[2025-11-24, 15:25:37] WARNING - MSGDriveFS oauth2_client_params={'client_id': 
'secret-63e2-4f4d-8270-secret', 'client_secret': '***', 'tenant_id': 
'secret-d265-4611-8dbb-secret', 'scope': 
['https://graph.microsoft.com/.default']}: source="msgraphfs.core"
[2025-11-24, 15:25:37] WARNING - MSGDriveFS post-super site_name=None 
drive_name=None drive_id=None: source="msgraphfs.core"
[2025-11-24, 15:25:37] WARNING - MSGDriveFS drive_id=None 
client_id=secret-63e2-4f4d-8270-secret tenant_id=None client_secret=*** 
oauth2_client_params=None: source="msgraphfs.core"
[2025-11-24, 15:25:37] WARNING - MSGDriveFS initialized in 
multi_site_mode=False: source="msgraphfs.core"
[2025-11-24, 15:25:37] ERROR - Task failed with exception: source="task"
ValueError: Either oauth2_client_params must be provided, or all of client_id, 
tenant_id, and client_secret must be provided (either as parameters or 
environment variables MSGRAPHFS_CLIENT_ID/AZURE_CLIENT_ID, 
MSGRAPHFS_TENANT_ID/AZURE_TENANT_ID, 
MSGRAPHFS_CLIENT_SECRET/AZURE_CLIENT_SECRET)
```


GitHub link: 
https://github.com/apache/airflow/discussions/58221#discussioncomment-15063359

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to