varaprasadregani opened a new issue, #59962: URL: https://github.com/apache/airflow/issues/59962
### Apache Airflow Provider(s) samba ### Versions of Apache Airflow Providers apache-airflow-providers-samba apache-airflow-providers-microsoft-azure ### Apache Airflow version 3.0.6 ### Operating System MacOS ### Deployment Other Docker-based deployment ### Deployment details _No response_ ### What happened When using the SambaHook (or sensors relying on it) in an environment where Remote Logging is enabled (e.g., S3, Azure Blob, GCS), log entries are duplicated. In observed cases, specific log lines are repeated multiple times. This behavior was successfully replicated in a local Breeze environment with remote logging configured. <img width="3406" height="2076" alt="Image" src="https://github.com/user-attachments/assets/74be2f66-9aef-42ba-880d-64ca7abf900e" /> Key Observations: - Remote Logging Dependency: The duplication does not occur when Remote Logging is disabled (LocalExecutor writing to local files). - Delayed Duplication: The logs often appear correct initially. The duplication tends to manifest after the task has been running for a few minutes or when the remote logs are refreshed/uploaded. Initial Logs: <img width="720" height="446" alt="Image" src="https://github.com/user-attachments/assets/29ff6f00-9e78-4621-91e0-76ea6f284e24" /> After a few minutes: <img width="719" height="437" alt="Image" src="https://github.com/user-attachments/assets/affc0118-048a-4b4f-bc48-efd5f003250f" /> - Root Cause Suspect: It appears related to how the underlying smbprotocol library handles logging handlers in conjunction with Airflow's remote logging propagation. ### What you think should happen instead Logs should appear once per emission, regardless of whether they are viewed locally or fetched from remote storage. <img width="720" height="446" alt="Image" src="https://github.com/user-attachments/assets/a98152b8-4143-428c-a109-80db10ab4759" /> ### How to reproduce 1. Set up an Airflow environment (e.g., Breeze). 2. Configure a Remote Logging backend (e.g., Azure Blob Storage or S3). Follow this [docs](https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/logging/index.html) - `AIRFLOW__LOGGING__REMOTE_LOGGING`=True - `AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER`=... 5. Configure a Samba connection (ex: azure_smb_default). 6. Trigger the following DAG which uses a TaskFlow sensor to check for files via SambaHook. Dag Code: ```from airflow.decorators import dag, task from airflow.providers.samba.hooks.samba import SambaHook from airflow.sensors.base import PokeReturnValue from pendulum import datetime import logging @dag( start_date=datetime(2025, 4, 1), schedule="@hourly", catchup=False, tags=["reproduction", "samba", "logging"], ) def azure_smb_logging_repro(): @task.sensor( poke_interval=60, timeout=3600, mode="reschedule", ) def check_files_exist( directory_path: str = "/testshare", samba_conn_id: str = "azure_smb_default" ) -> PokeReturnValue: try: with SambaHook(samba_conn_id=samba_conn_id) as samba_hook: print(f"Checking for files in directory: {directory_path}") files = samba_hook.listdir(directory_path) if files and len(files) > 0: print(f"✓ Files detected! Found {len(files)} items.") return PokeReturnValue(is_done=True) else: print(f"✗ No files found in {directory_path}. Will retry...") return PokeReturnValue(is_done=False) except Exception as e: print(f"Error while checking directory: {str(e)}") return PokeReturnValue(is_done=False) azure_smb_logging_repro() ``` ### Anything else The issue was replicated in Breeze using the default remote logging configuration. It seems that the smbprotocol library might be attaching handlers to the root logger in a way that conflicts with how Airflow's remote logging handles log propagation, causing the same message to be processed by multiple handlers. ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
