wolfier opened a new issue, #32996:
URL: https://github.com/apache/airflow/issues/32996

   ### Apache Airflow version
   
   2.6.3
   
   ### What happened
   
   A task instance's 
[log_url](https://github.com/apache/airflow/blob/2.6.3/airflow/models/taskinstance.py#L726)
 does not contain the full URL defined in 
[base_url](https://github.com/apache/airflow/blob/2.6.3/airflow/models/taskinstance.py#L729C9-L729C69).
   
   ### What you think should happen instead
   
   The base_url may contain paths that should be acknowledged when build the 
log_url. 
   
   The log_url is built with 
[urljoin](https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urljoin).
 Due to how urljoin builds URLs, any existing paths are ignored leading to a 
faulty URL. 
   
   ### How to reproduce
   
   This snippet showcases how urljoin ignores existing paths when building the 
url.
   
   ```
   >>> from urllib.parse import urljoin
   >>> 
   >>> 
   >>> urljoin(
   ...     "https://my.astronomer.run/path";,
   ...     f"log?execution_date=test"
   ...     f"&task_id=wow"
   ...     f"&dag_id=super"
   ...     f"&map_index=-1",
   ... )
   
'https://eochgroup.astronomer.run/log?execution_date=test&task_id=wow&dag_id=super&map_index=-1'
   ```
   
   ### Operating System
   
   n/a
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Astronomer
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   A way to fix this can be to utilize 
[urlsplit](https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlsplit)
 and 
[urlunsplit](https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlunsplit)
 to account for existing paths.
   
   ```
   from urllib.parse import urlsplit, urlunsplit
   
   parts = urlsplit("https://my.astronomer.run/paths";)
   urlunsplit((
           parts.scheme,
           parts.netloc,
           f"{parts.path}/log",
           f"execution_date=test"
           f"&task_id=wow"
           f"&dag_id=super"
           f"&map_index=-1",
           ""
       )
   )
   ```
   
   Here is the fix in action.
   
   ```
   >>> parts = urlsplit("https://my.astronomer.run/paths";)
   >>> urlunsplit((
   ...     parts.scheme,
   ...     parts.netloc,
   ...     f"{parts.path}/log",
   ...     f"execution_date=test"
   ...     f"&task_id=wow"
   ...     f"&dag_id=super"
   ...     f"&map_index=-1",
   ...     ''))
   
'https://my.astronomer.run/paths/log?execution_date=test&task_id=wow&dag_id=super&map_index=-1'
   >>>
   >>> parts = urlsplit("https://my.astronomer.run/paths/test";)
   >>> urlunsplit((
   ...     parts.scheme,
   ...     parts.netloc,
   ...     f"{parts.path}/log",
   ...     f"execution_date=test"
   ...     f"&task_id=wow"
   ...     f"&dag_id=super"
   ...     f"&map_index=-1",
   ...     ''))
   
'https://my.astronomer.run/paths/test/log?execution_date=test&task_id=wow&dag_id=super&map_index=-1'
   ```
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to