[ 
https://issues.apache.org/jira/browse/BEAM-13595?focusedWorklogId=713901&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-713901
 ]

ASF GitHub Bot logged work on BEAM-13595:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Jan/22 16:58
            Start Date: 24/Jan/22 16:58
    Worklog Time Spent: 10m 
      Work Description: ryanthompson591 commented on a change in pull request 
#16589:
URL: https://github.com/apache/beam/pull/16589#discussion_r790937340



##########
File path: sdks/python/apache_beam/runners/worker/sdk_worker_main.py
##########
@@ -76,8 +76,9 @@ def create_harness(environment, dry_run=False):
   # These are used for dataflow templates.
   RuntimeValueProvider.set_runtime_options(pipeline_options_dict)
   sdk_pipeline_options = PipelineOptions.from_dictionary(pipeline_options_dict)
+  pickle_library = sdk_pipeline_options.view_as(SetupOptions).pickle_library

Review comment:
       move the line that declares this variable right above where it is used.

##########
File path: sdks/python/apache_beam/runners/worker/sdk_worker_main.py
##########
@@ -87,17 +88,18 @@ def create_harness(environment, dry_run=False):
   _LOGGER.info('semi_persistent_directory: %s', semi_persistent_directory)
   _worker_id = environment.get('WORKER_ID', None)
 
-  try:
-    _load_main_session(semi_persistent_directory)
-  except CorruptMainSessionException:
-    exception_details = traceback.format_exc()
-    _LOGGER.error(
-        'Could not load main session: %s', exception_details, exc_info=True)
-    raise
-  except Exception:  # pylint: disable=broad-except
-    exception_details = traceback.format_exc()
-    _LOGGER.error(
-        'Could not load main session: %s', exception_details, exc_info=True)
+  if pickle_library != pickler.USE_CLOUDPICKLE:

Review comment:
       later on when we change the default to cloudpickle, won't this break.

##########
File path: sdks/python/apache_beam/runners/portability/stager.py
##########
@@ -341,9 +343,11 @@ def create_job_resources(options,  # type: PipelineOptions
       pickled_session_file = os.path.join(
           temp_dir, names.PICKLED_MAIN_SESSION_FILE)
       pickler.dump_session(pickled_session_file)
-      resources.append(
-          Stager._create_file_stage_to_artifact(
-              pickled_session_file, names.PICKLED_MAIN_SESSION_FILE))
+      # for pickle_library: cloudpickle, dump_session is no op
+      if os.path.exists(pickled_session_file):

Review comment:
       I like the way you did this.  Is it possible to add a unit test for this 
behavior?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 713901)
    Time Spent: 2h 20m  (was: 2h 10m)

> Disable save_main_session when using cloudpickle
> ------------------------------------------------
>
>                 Key: BEAM-13595
>                 URL: https://issues.apache.org/jira/browse/BEAM-13595
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-py-core
>            Reporter: Ryan Thompson
>            Assignee: Anand Inguva
>            Priority: P3
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> [save_main_session|https://github.com/apache/beam/blob/17b62ad9e050f80a88793457aee710ea4711d47b/sdks/python/apache_beam/options/pipeline_options.py#L1089]
>  is a flag in the python sdk that is used by the dill library to save all 
> classes/variables/lambas in memory.
>  
> When the cloudpickle library is used, no session is saved (it is a no op). 
> However, if the runner sees the save_main_session option set it may try to 
> access/move/manager the save_main_session saved file.
>  
> To avoid this, when cloudpickle is the main library save_main_session should 
> be false.
>  
> See also:
> https://issues.apache.org/jira/browse/BEAM-13386



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to