varaprasadregani opened a new issue, #59962:
URL: https://github.com/apache/airflow/issues/59962

   ### Apache Airflow Provider(s)
   
   samba
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-samba
   apache-airflow-providers-microsoft-azure
   
   ### Apache Airflow version
   
   3.0.6
   
   ### Operating System
   
   MacOS
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   When using the SambaHook (or sensors relying on it) in an environment where 
Remote Logging is enabled (e.g., S3, Azure Blob, GCS), log entries are 
duplicated.
   
   In observed cases, specific log lines are repeated multiple times. This 
behavior was successfully replicated in a local Breeze environment with remote 
logging configured.
   
   <img width="3406" height="2076" alt="Image" 
src="https://github.com/user-attachments/assets/74be2f66-9aef-42ba-880d-64ca7abf900e";
 />
   
   Key Observations:
   - Remote Logging Dependency: The duplication does not occur when Remote 
Logging is disabled (LocalExecutor writing to local files).
   - Delayed Duplication: The logs often appear correct initially. The 
duplication tends to manifest after the task has been running for a few minutes 
or when the remote logs are refreshed/uploaded.
   Initial Logs:
   
   <img width="720" height="446" alt="Image" 
src="https://github.com/user-attachments/assets/29ff6f00-9e78-4621-91e0-76ea6f284e24";
 />
   
   After a few minutes:
   
   <img width="719" height="437" alt="Image" 
src="https://github.com/user-attachments/assets/affc0118-048a-4b4f-bc48-efd5f003250f";
 />
   
   - Root Cause Suspect: It appears related to how the underlying smbprotocol 
library handles logging handlers in conjunction with Airflow's remote logging 
propagation.
   
   ### What you think should happen instead
   
   Logs should appear once per emission, regardless of whether they are viewed 
locally or fetched from remote storage.
   
   <img width="720" height="446" alt="Image" 
src="https://github.com/user-attachments/assets/a98152b8-4143-428c-a109-80db10ab4759";
 />
   
   ### How to reproduce
   
   1. Set up an Airflow environment (e.g., Breeze).
   2. Configure a Remote Logging backend (e.g., Azure Blob Storage or S3). 
Follow this 
[docs](https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/logging/index.html)
         - `AIRFLOW__LOGGING__REMOTE_LOGGING`=True
         - `AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER`=...
   5. Configure a Samba connection (ex: azure_smb_default).
   6. Trigger the following DAG which uses a TaskFlow sensor to check for files 
via SambaHook.
   
   Dag Code:
   ```from airflow.decorators import dag, task
   from airflow.providers.samba.hooks.samba import SambaHook
   from airflow.sensors.base import PokeReturnValue
   from pendulum import datetime
   import logging
   
   @dag(
       start_date=datetime(2025, 4, 1),
       schedule="@hourly", 
       catchup=False,
       tags=["reproduction", "samba", "logging"],
   )
   def azure_smb_logging_repro():
   
       @task.sensor(
           poke_interval=60,  
           timeout=3600, 
           mode="reschedule", 
       )
       def check_files_exist(
           directory_path: str = "/testshare", samba_conn_id: str = 
"azure_smb_default"
       ) -> PokeReturnValue:
           try:
               with SambaHook(samba_conn_id=samba_conn_id) as samba_hook:
                   print(f"Checking for files in directory: {directory_path}")
   
                   files = samba_hook.listdir(directory_path)
   
                   if files and len(files) > 0:
                       print(f"✓ Files detected! Found {len(files)} items.")
                       return PokeReturnValue(is_done=True)
                   else:
                       print(f"✗ No files found in {directory_path}. Will 
retry...")
                       return PokeReturnValue(is_done=False)
   
           except Exception as e:
               print(f"Error while checking directory: {str(e)}")
               return PokeReturnValue(is_done=False)
   
   azure_smb_logging_repro()
   ```
   
   ### Anything else
   
   The issue was replicated in Breeze using the default remote logging 
configuration. It seems that the smbprotocol library might be attaching 
handlers to the root logger in a way that conflicts with how Airflow's remote 
logging handles log propagation, causing the same message to be processed by 
multiple handlers.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to