dctelus created BEAM-14514:
------------------------------

             Summary: Beam python SDK ignores pickle_library option in 
pipeline.run()
                 Key: BEAM-14514
                 URL: https://issues.apache.org/jira/browse/BEAM-14514
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-core
    Affects Versions: 2.38.0
            Reporter: dctelus


Context:

In the Python SDK, you can specify the Pipeline argument --pickle_library which 
dictates which library to use to pickle variables to send them from the 
executing machine to the workers (when save_main_session is True).

Issue:

pickle_library options is ignored in the pipeline.run() function, which reverts 
to using dill (the default one).

https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline.py#L570

Reproduce:

Add --pickle_library cloudpickle to pipeline options and notice that dill is 
used for this session dump, even though cloudpickle is provided.

 

I found this out because dill parser throws an exception for my use case, but 
cloud pickle doesn't.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to