gemini-code-assist[bot] commented on code in PR #35849:
URL: https://github.com/apache/beam/pull/35849#discussion_r2268400604
##########
sdks/python/apache_beam/yaml/yaml_provider.py:
##########
@@ -1531,12 +1531,15 @@ def _join_url_or_filepath(base, path):
return path
base_scheme = urllib.parse.urlparse(base, '').scheme
path_scheme = urllib.parse.urlparse(path, base_scheme).scheme
- if path_scheme != base_scheme:
+ if path_scheme != base_scheme or path.startswith(path_scheme + "://"):
+ # path has its own scheme or path is an absolute path
return path
- elif base_scheme and base_scheme in urllib.parse.uses_relative:
- return urllib.parse.urljoin(base, path)
else:
- return FileSystems.join(FileSystems.split(base)[0], path)
+ # path is a relative path
+ if base_scheme and base_scheme in urllib.parse.uses_relative:
+ return urllib.parse.urljoin(base, path)
+ else:
+ return FileSystems.join(FileSystems.split(base)[0], path)
Review Comment:

The current logic correctly fixes the issue with absolute paths. However,
the logic for detecting an absolute URL is a bit complex and could be
simplified for better readability and robustness.
A more direct way to check if a path is an absolute URL is to see if it has
a scheme, using `urllib.parse.urlparse(path).scheme`. This avoids the need to
compare with `base_scheme` and the `startswith` check, which can be brittle.
Consider simplifying the logic as follows. This handles absolute URLs,
absolute file paths (via `FileSystems.join`), and relative paths correctly and
more clearly.
```python
if urllib.parse.urlparse(path).scheme:
return path
base_scheme = urllib.parse.urlparse(base).scheme
if base_scheme and base_scheme in urllib.parse.uses_relative:
return urllib.parse.urljoin(base, path)
else:
return FileSystems.join(FileSystems.split(base)[0], path)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]