damccorm opened a new issue, #21615: URL: https://github.com/apache/beam/issues/21615
Context: In the Python SDK, you can specify the Pipeline argument \--pickle_library which dictates which library to use to pickle variables to send them from the executing machine to the workers (when save_main_session is True). Issue: pickle_library options is ignored in the pipeline.run() function, which reverts to using dill (the default one). https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline.py#L570 Reproduce: Add \--pickle_library cloudpickle to pipeline options and notice that dill is used for this session dump, even though cloudpickle is provided. I found this out because dill parser throws an exception for my use case, but cloud pickle doesn't. Imported from Jira [BEAM-14514](https://issues.apache.org/jira/browse/BEAM-14514). Original Jira may contain additional context. Reported by: dctelus. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
