gemini-code-assist[bot] commented on code in PR #38348:
URL: https://github.com/apache/beam/pull/38348#discussion_r3191478737
##########
sdks/python/apache_beam/runners/portability/stager.py:
##########
@@ -701,14 +701,30 @@ def _remove_dependency_from_requirements(
return tmp_requirements_filename
@staticmethod
- def _extract_local_packages(requirements_file):
+ def _extract_local_packages(requirements_file, temp_dir):
local_deps = []
pypi_deps = []
with open(requirements_file, 'r') as fin:
+ staging_temp_dir = tempfile.mkdtemp(dir=temp_dir)
for line in fin:
dep = line.strip()
+ parsed_url = urlparse(dep)
Review Comment:

The `urlparse` function is used here but it is not imported in this file.
This will cause a `NameError` at runtime. Please add `from urllib.parse import
urlparse` to the imports section of the file.
##########
sdks/python/apache_beam/runners/portability/stager.py:
##########
@@ -701,14 +701,30 @@ def _remove_dependency_from_requirements(
return tmp_requirements_filename
@staticmethod
- def _extract_local_packages(requirements_file):
+ def _extract_local_packages(requirements_file, temp_dir):
local_deps = []
pypi_deps = []
with open(requirements_file, 'r') as fin:
+ staging_temp_dir = tempfile.mkdtemp(dir=temp_dir)
for line in fin:
dep = line.strip()
+ parsed_url = urlparse(dep)
if os.path.exists(dep):
local_deps.append(dep)
+ elif parsed_url.scheme:
+ last_component = os.path.basename(parsed_url.path)
+ if not last_component:
+ _LOGGER.warning(
+ 'Extra package %s has an unexpected format hence will not be
downloaded locally.'
+ % dep)
+ continue
+ # Download remote package.
+ _LOGGER.info(
+ 'Downloading remote extra package: %s locally before staging',
+ dep)
+ local_file_path = FileSystems.join(staging_temp_dir, last_component)
+ Stager._download_file(dep, local_file_path)
+ local_deps.append(local_file_path)
Review Comment:

This implementation has two potential issues:
1. **Filename Collision**: If multiple URLs have the same basename (e.g.,
`.../v1/pkg.tgz` and `.../v2/pkg.tgz`), they will overwrite each other in the
temporary directory, and `local_deps` will contain the same path twice.
2. **Unsupported Schemes**: `FileSystems.open` (used in `_download_file`)
does not support VCS schemes like `git+https`. If such a URL is encountered,
the staging process will crash.
Adding a unique prefix to the filename and wrapping the download in a
try-except block to fall back to `pypi_deps` (letting `pip` handle it on the
worker) makes this more robust.
```suggestion
local_file_path = FileSystems.join(
staging_temp_dir, f"{len(local_deps)}_{last_component}")
try:
Stager._download_file(dep, local_file_path)
local_deps.append(local_file_path)
except Exception as e:
_LOGGER.warning(
'Failed to download remote package %s, falling back to pip:
%s',
dep,
e)
pypi_deps.append(dep)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]