dctelus created BEAM-14514:
------------------------------
Summary: Beam python SDK ignores pickle_library option in
pipeline.run()
Key: BEAM-14514
URL: https://issues.apache.org/jira/browse/BEAM-14514
Project: Beam
Issue Type: Bug
Components: sdk-py-core
Affects Versions: 2.38.0
Reporter: dctelus
Context:
In the Python SDK, you can specify the Pipeline argument --pickle_library which
dictates which library to use to pickle variables to send them from the
executing machine to the workers (when save_main_session is True).
Issue:
pickle_library options is ignored in the pipeline.run() function, which reverts
to using dill (the default one).
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline.py#L570
Reproduce:
Add --pickle_library cloudpickle to pipeline options and notice that dill is
used for this session dump, even though cloudpickle is provided.
I found this out because dill parser throws an exception for my use case, but
cloud pickle doesn't.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)