nailo2c opened a new pull request, #61013:
URL: https://github.com/apache/airflow/pull/61013

   Closes: #58946
   
   # Why
   
   Following this 
[doc](https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/logging/index.html)
 causes Airflow to write logs under an unexpected `<no name>` folder in Azure 
Blob Storage.
   
   <img width="2558" height="801" alt="issue-58946-reproduce" 
src="https://github.com/user-attachments/assets/ef1013ba-63f4-4093-9c30-8ee0d54d0a5b";
 />
   
   # How
   
   `wasb` is used as a flag to select the Azure Blob remote logging 
implementation, but the current code path does not normalize the `wasb://` 
scheme.
   
   
https://github.com/apache/airflow/blob/89f109bcc85f328ece264bc73510496d3250edde/airflow-core/src/airflow/config_templates/airflow_local_settings.py#L215-L216
   
   <br>
   
   In the documentation prior to version 12.4.1, users were instructed to use a 
`wasb-` prefix (e.g. `wasb-logs`). However, PR #51988 notes that this could 
lead to `ResourceNotFoundError` and authentication failures.
   
   As a result, the documentation was updated to the current guidance (using 
`remote_base_log_folder = wasb://...`):
   
   + Current docs: 
https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/logging/index.html
   + Older docs (12.4.1): 
https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/12.4.1/logging/index.html
   
   But the current implementation treats `wasb://` as part of the blob name 
when constructing the remote log path:
   
   
https://github.com/apache/airflow/blob/89f109bcc85f328ece264bc73510496d3250edde/providers/microsoft/azure/src/airflow/providers/microsoft/azure/log/wasb_task_handler.py#L60
   
   1. `wasb-logs` (works)
   ```python
   self.remote_base = "wasb-logs"
   os.path.join("wasb-logs", "dag_id=xxx/task_id=yyy/attempt=1.log")            
                                                           
   
   # Result: "wasb-logs/dag_id=xxx/task_id=yyy/attempt=1.log"                   
                                                             
   # This is a valid Azure Blob name
   ```
   
   2. `wasb://logs` (problematic)
   ```python
   self.remote_base = "wasb://logs"
   os.path.join("wasb://logs", "dag_id=xxx/task_id=yyy/attempt=1.log")
   
   # Result: "wasb://logs/dag_id=xxx/task_id=yyy/attempt=1.log"
   # The `wasb://` scheme is being treated as part of the blob name.
   # In Azure Portal, this ends up looking like a weird virtual folder 
structure,
   # e.g. `wasb:` / `{No Name}` / `logs` / ..
   ```
   
   # What
   
   Normalize `remote_base_log_folder` by stripping the `wasb://` prefix before 
constructing the blob key:
   ```python
   if remote_base_log_folder.startswith("wasb://"):
       wasb_remote_base = remote_base_log_folder.removeprefix("wasb://")
   else:
       wasb_remote_base = remote_base_log_folder
   ```
   
   I intentionally did not change the selection logic to `elif 
remote_base_log_folder.startswith("wasb://"):` because I do not want to impact 
users who currently configure `wasb-...` in their settings.
   
https://github.com/apache/airflow/blob/89f109bcc85f328ece264bc73510496d3250edde/airflow-core/src/airflow/config_templates/airflow_local_settings.py#L215
   
   Result:
   <img width="1917" height="559" alt="issue-58946-fixed" 
src="https://github.com/user-attachments/assets/aac46363-d560-450a-bd8f-2c2675ff7bdd";
 />
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   <!--
   If generative AI tooling has been used in the process of authoring this PR, 
please
   change below checkbox to `[X]` followed by the name of the tool, uncomment 
the "Generated-by".
   -->
   
   - [ ] No (please specify the tool below)
   
   <!--
   Generated-by: [Tool Name] following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to