tvalentyn commented on code in PR #26220:
URL: https://github.com/apache/beam/pull/26220#discussion_r1169884132


##########
sdks/python/apache_beam/options/pipeline_options.py:
##########
@@ -876,14 +877,14 @@ def validate(self, validator):
 
     return errors
 
-  def lookup_dataflow_service_option(self, key, default=None):
+  def get_cloud_profiler_service_name(self, key):

Review Comment:
   given that we are looking for a specific option, `key` no longer needs to be 
a parameter. _ENABLE_GOOGLE_CLOUD_PROFILER  is the `key` .



##########
sdks/python/apache_beam/options/pipeline_options_test.py:
##########
@@ -626,6 +626,14 @@ def test_lookup_experiments(self):
     self.assertEqual(
         True, debug_options.lookup_experiment('existing_experiment'))
 
+  def test_get_cloud_profiler_service_name(self):

Review Comment:
   Let's test `None` and JOB_NAME case as well.  See for example 
https://stackoverflow.com/questions/31582750/python-mock-patch-os-environ-and-return-value/67477901#67477901
 how to mock an environment variable.
   
   Possible names for unit tests (ideally, test name includes tested scenario 
and expected result):
   
   ```
   test_profiler_uses_provided_service_name_when_specified
   test_profiler_uses_job_name_when_service_not_specified
   ```
   
   
   



##########
sdks/python/apache_beam/runners/worker/sdk_worker_main.py:
##########
@@ -173,22 +173,21 @@ def create_harness(environment, dry_run=False):
   return fn_log_handler, sdk_harness, sdk_pipeline_options
 
 
-def main(unused_argv):
-  """Main entry point for SDK Fn Harness."""
-  (fn_log_handler, sdk_harness,
-   sdk_pipeline_options) = create_harness(os.environ)
+def _start_profiler_if_enabled(sdk_pipeline_options):
   experiments = (sdk_pipeline_options.view_as(DebugOptions).experiments or [])
-  service_name = sdk_pipeline_options.view_as(
-      GoogleCloudOptions).lookup_dataflow_service_option(
-          _ENABLE_GOOGLE_CLOUD_PROFILER, default=os.environ["JOB_NAME"])
-
-  if ((_ENABLE_GOOGLE_CLOUD_PROFILER in experiments) or service_name):
+  gcp_profiler_service_name = sdk_pipeline_options.view_as(
+      GoogleCloudOptions).get_cloud_profiler_service_name(
+          _ENABLE_GOOGLE_CLOUD_PROFILER)
+  if (_ENABLE_GOOGLE_CLOUD_PROFILER in experiments

Review Comment:
   I believe there is a behavior change here if the 
_ENABLE_GOOGLE_CLOUD_PROFILER is specified in experiments without a service 
name. If I read this code correctly, `gcp_profiler_service_name` will evaluate 
to `None` now, and we will not enter line 187.
   
   You can catch such errors with better unit tests if you further decouple the 
logic to identify the cloud profiler configuration, and actually run the cloud 
profiler. You can introduce the another helper:
   
   ```
   def _start_profiler(_service_name, _service_version)
   ```
   
   and write unit tests that will mock out this function, and check what 
parameters it is being called with (see `unittest.mock.assert_called_with` . 
Then you can test both the case when cloud profiler is passed as an experiment 
and when cloud profiler is passed via dataflow service option.
   
   With such (a bit higher level) tests, you don't need to unit test the 
`get_cloud_profiler_service_name` helper as tests like: 
   
   `test_profiler_uses_provided_service_name_when_specified`
   `test_profiler_uses_job_name_when_service_not_specified`
   `test_profiler_uses_job_name_when_specified_as_experiment`
   
   will still exercise the  `get_cloud_profiler_service_name` helper.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to