claudevdm opened a new issue, #34903: URL: https://github.com/apache/beam/issues/34903
### What needs to happen? Cloudpickle is set as the default `pickle_library` in 2.65.0, where the previous default was dill. See https://s.apache.org/beam-cloudpickle-next-steps for background. This can cause breakages in cases where the behavior of dill and cloudpickle diverge. [cloudpickle_pickler_test](https://github.com/apache/beam/blob/b9fa49a9827dd28349e382f479ebd1a8bbe27d07/sdks/python/apache_beam/internal/cloudpickle_pickler_test.py) tests demonstrates the behavior of cloudpickle in various cases. Notable behavior includes: 1. [Globals defined in __main__ module are pickled by value](https://github.com/apache/beam/blob/b9fa49a9827dd28349e382f479ebd1a8bbe27d07/sdks/python/apache_beam/internal/cloudpickle_pickler_test.py#L51) 2. [Globals defined in importable modules are pickled by reference](https://github.com/apache/beam/blob/b9fa49a9827dd28349e382f479ebd1a8bbe27d07/sdks/python/apache_beam/internal/cloudpickle_pickler_test.py#L54) 3. [Module aliased globals are pickled by value](https://github.com/apache/beam/blob/b9fa49a9827dd28349e382f479ebd1a8bbe27d07/sdks/python/apache_beam/internal/cloudpickle_pickler_test.py#L68) 4. All functions and classes defined in `__main__` module are pickled by value 5. All closures and dynamic types are pickled by value. Known issues include: - Unittests that rely on globals will fail. Cloudpickle assumes the `__main__` module is not available in the unpickling environment and therefore redefines globals. To fix tests that rely on globals use the apache_beam.utils.shared module as shown in https://github.com/apache/beam/blob/b9fa49a9827dd28349e382f479ebd1a8bbe27d07/sdks/python/apache_beam/internal/cloudpickle_pickler_test.py#L54 - Closures and dynamic classes that reference unpicklable objects fail. This can be fixed by defining functions at the top level and binding arguments with `functools.partial` when necessary - When encountering types not picklable by cloudpickle, rather define these types in an importable module in which case they will be pickled by reference. Please report any new issues on this tracking bug. For any breakages that require reverting back to dill specify `pickle_library=dill` ### Issue Priority Priority: 2 (default / most normal work should be filed as P2) ### Issue Components - [x] Component: Python SDK - [ ] Component: Java SDK - [ ] Component: Go SDK - [ ] Component: Typescript SDK - [ ] Component: IO connector - [ ] Component: Beam YAML - [ ] Component: Beam examples - [ ] Component: Beam playground - [ ] Component: Beam katas - [ ] Component: Website - [ ] Component: Infrastructure - [ ] Component: Spark Runner - [ ] Component: Flink Runner - [ ] Component: Samza Runner - [ ] Component: Twister2 Runner - [ ] Component: Hazelcast Jet Runner - [ ] Component: Google Cloud Dataflow Runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org