claudevdm opened a new issue, #34903:
URL: https://github.com/apache/beam/issues/34903

   ### What needs to happen?
   
   Cloudpickle is set as the default `pickle_library` in 2.65.0, where the 
previous default was dill. See https://s.apache.org/beam-cloudpickle-next-steps 
for background.
   
   This can cause breakages in cases where the behavior of dill and cloudpickle 
diverge.
   
   
[cloudpickle_pickler_test](https://github.com/apache/beam/blob/b9fa49a9827dd28349e382f479ebd1a8bbe27d07/sdks/python/apache_beam/internal/cloudpickle_pickler_test.py)
 tests demonstrates the behavior of cloudpickle in various cases. Notable 
behavior includes:
   
   1. [Globals defined in __main__ module are pickled by 
value](https://github.com/apache/beam/blob/b9fa49a9827dd28349e382f479ebd1a8bbe27d07/sdks/python/apache_beam/internal/cloudpickle_pickler_test.py#L51)
   2. [Globals defined in importable modules are pickled by 
reference](https://github.com/apache/beam/blob/b9fa49a9827dd28349e382f479ebd1a8bbe27d07/sdks/python/apache_beam/internal/cloudpickle_pickler_test.py#L54)
   3. [Module aliased globals are pickled by 
value](https://github.com/apache/beam/blob/b9fa49a9827dd28349e382f479ebd1a8bbe27d07/sdks/python/apache_beam/internal/cloudpickle_pickler_test.py#L68)
   4. All functions and classes defined in `__main__` module are pickled by 
value
   5. All closures and dynamic types are pickled by value. 
   
   
   Known issues include:
   
   - Unittests that rely on globals will fail. Cloudpickle assumes the 
`__main__` module is not available in the unpickling environment and therefore 
redefines globals. To fix tests that rely on globals use the 
apache_beam.utils.shared module as shown in 
https://github.com/apache/beam/blob/b9fa49a9827dd28349e382f479ebd1a8bbe27d07/sdks/python/apache_beam/internal/cloudpickle_pickler_test.py#L54
   - Closures and dynamic classes that reference unpicklable objects fail. This 
can be fixed by defining functions at the top level and binding arguments with 
`functools.partial` when necessary
   - When encountering types not picklable by cloudpickle, rather define these 
types in an importable module in which case they will be pickled by reference. 
   
   Please report any new issues on this tracking bug. For any breakages that 
require reverting back to dill specify `pickle_library=dill`
   
   
   
   ### Issue Priority
   
   Priority: 2 (default / most normal work should be filed as P2)
   
   ### Issue Components
   
   - [x] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Infrastructure
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to