nailo2c opened a new pull request, #61013: URL: https://github.com/apache/airflow/pull/61013
Closes: #58946 # Why Following this [doc](https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/logging/index.html) causes Airflow to write logs under an unexpected `<no name>` folder in Azure Blob Storage. <img width="2558" height="801" alt="issue-58946-reproduce" src="https://github.com/user-attachments/assets/ef1013ba-63f4-4093-9c30-8ee0d54d0a5b" /> # How `wasb` is used as a flag to select the Azure Blob remote logging implementation, but the current code path does not normalize the `wasb://` scheme. https://github.com/apache/airflow/blob/89f109bcc85f328ece264bc73510496d3250edde/airflow-core/src/airflow/config_templates/airflow_local_settings.py#L215-L216 <br> In the documentation prior to version 12.4.1, users were instructed to use a `wasb-` prefix (e.g. `wasb-logs`). However, PR #51988 notes that this could lead to `ResourceNotFoundError` and authentication failures. As a result, the documentation was updated to the current guidance (using `remote_base_log_folder = wasb://...`): + Current docs: https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/logging/index.html + Older docs (12.4.1): https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/12.4.1/logging/index.html But the current implementation treats `wasb://` as part of the blob name when constructing the remote log path: https://github.com/apache/airflow/blob/89f109bcc85f328ece264bc73510496d3250edde/providers/microsoft/azure/src/airflow/providers/microsoft/azure/log/wasb_task_handler.py#L60 1. `wasb-logs` (works) ```python self.remote_base = "wasb-logs" os.path.join("wasb-logs", "dag_id=xxx/task_id=yyy/attempt=1.log") # Result: "wasb-logs/dag_id=xxx/task_id=yyy/attempt=1.log" # This is a valid Azure Blob name ``` 2. `wasb://logs` (problematic) ```python self.remote_base = "wasb://logs" os.path.join("wasb://logs", "dag_id=xxx/task_id=yyy/attempt=1.log") # Result: "wasb://logs/dag_id=xxx/task_id=yyy/attempt=1.log" # The `wasb://` scheme is being treated as part of the blob name. # In Azure Portal, this ends up looking like a weird virtual folder structure, # e.g. `wasb:` / `{No Name}` / `logs` / .. ``` # What Normalize `remote_base_log_folder` by stripping the `wasb://` prefix before constructing the blob key: ```python if remote_base_log_folder.startswith("wasb://"): wasb_remote_base = remote_base_log_folder.removeprefix("wasb://") else: wasb_remote_base = remote_base_log_folder ``` I intentionally did not change the selection logic to `elif remote_base_log_folder.startswith("wasb://"):` because I do not want to impact users who currently configure `wasb-...` in their settings. https://github.com/apache/airflow/blob/89f109bcc85f328ece264bc73510496d3250edde/airflow-core/src/airflow/config_templates/airflow_local_settings.py#L215 Result: <img width="1917" height="559" alt="issue-58946-fixed" src="https://github.com/user-attachments/assets/aac46363-d560-450a-bd8f-2c2675ff7bdd" /> --- ##### Was generative AI tooling used to co-author this PR? <!-- If generative AI tooling has been used in the process of authoring this PR, please change below checkbox to `[X]` followed by the name of the tool, uncomment the "Generated-by". --> - [ ] No (please specify the tool below) <!-- Generated-by: [Tool Name] following [the guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions) --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
