[GitHub] [beam] tvalentyn commented on a diff in pull request #28564: [Python] Log dependencies at runtime and at submission environment

via GitHub Wed, 27 Sep 2023 11:44:14 -0700


tvalentyn commented on code in PR #28564:
URL: https://github.com/apache/beam/pull/28564#discussion_r1339013982



##########
sdks/python/apache_beam/runners/portability/stager.py:
##########
@@ -837,3 +846,20 @@ def _create_beam_sdk(sdk_remote_location, temp_dir):
     return [
         Stager._create_file_stage_to_artifact(local_download_file, staged_name)
     ]
+
+  @staticmethod
+  def _create_stage_submission_env_dependencies(temp_dir):
+    try:
+      local_dependency_file_path = os.path.join(
+          temp_dir, SUBMISSION_ENV_DEPENDENCIES_FILE)
+      dependencies = subprocess.check_output(
+          [sys.executable, '-m', 'pip', 'freeze'])
+      with open(local_dependency_file_path, 'w') as f:
+        f.write(str(dependencies))
+      return [
+          Stager._create_file_stage_to_artifact(
+              local_dependency_file_path, SUBMISSION_ENV_DEPENDENCIES_FILE)
+      ]
+    except Exception:
+      _LOGGER.debug("couldn't stage dependencies from submission environment")

Review Comment:
   nit: wording suggestion since we are not staging deps themselves like during 
requirements.txt processing.
   ```suggestion
         _LOGGER.warning("Couldn't stage a list of installed dependencies in 
submission environment.")
   ```
   
   also you could log the exception.



##########
sdks/python/apache_beam/runners/portability/stager.py:
##########
@@ -837,3 +846,20 @@ def _create_beam_sdk(sdk_remote_location, temp_dir):
     return [
         Stager._create_file_stage_to_artifact(local_download_file, staged_name)
     ]
+
+  @staticmethod
+  def _create_stage_submission_env_dependencies(temp_dir):

Review Comment:
   Let's add a here  in case someone is wondering why we do this staging. The 
purpose is to compare the both lists at runtime, make it possible for runners 
to warn users about possible mismatches and help debug issues related to 
environment mismatches .



##########
sdks/python/apache_beam/transforms/environments.py:
##########
@@ -124,9 +124,9 @@ def __init__(self,
         dict(resource_hints) if resource_hints else {})
 
   def __eq__(self, other):
+    # don't compare artifacts since they have different hashes in their names.
     return (
-        self.__class__ == other.__class__ and
-        self._artifacts == other._artifacts

Review Comment:
   this change is not obvious to me. I think we shouldn't include the staged 
job submission dependencies into the list of environment artifacts.
   
   The transform environment describes the runtime  environment the transforms 
should have on the runner. Job submission dependencies currently do not impose 
any requirements on runtime environment, so they should not be a part of 
environment definition. Would it resolve the issue?  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] tvalentyn commented on a diff in pull request #28564: [Python] Log dependencies at runtime and at submission environment

Reply via email to