Hi Beam Devs,

I am making progress on making cloudpickle the default pickling library and
removing the strict dependency on dill as outlined in
https://s.apache.org/beam-cloudpickle-next-steps.

The current plan  is to:

1. Make cloudpickle the default library in Beam 2.65.0 release (see
https://github.com/apache/beam/pull/34695). Users will be able to specify
pickle_library='dill' without any additional requirements. There will still
be a hard dependency on dill (blocked by #2) but it is a step in the right
direction.

2. Remove the strict dependency on dill in Beam 2.66.0 release. Dill is
directly used for coder's encoding types in FastPrimitivesCoderImpl [1][2].
I prefer to submit a fix for this after the branch cut so we have more time
to identify any issues.

Coudpickle has some fundamentally different pickling behavior to dill that
is likely to break:

   -

   Unittests that rely on globals
   -

      This can be fixed by using apache_beam.utils.shared [3]
      -

   Closures and dynamic classes that reference unpicklable globals
   -

      This can be fixed by defining functions in the top level, and using
      functools.partial to bind parameters if necessary


[1]
https://github.com/apache/beam/blob/b9fa49a9827dd28349e382f479ebd1a8bbe27d07/sdks/python/apache_beam/coders/coder_impl.py#L529

[2]
https://github.com/apache/beam/blob/b9fa49a9827dd28349e382f479ebd1a8bbe27d07/sdks/python/apache_beam/coders/coder_impl.py#L595

[3]
https://github.com/apache/beam/blob/b9fa49a9827dd28349e382f479ebd1a8bbe27d07/sdks/python/apache_beam/internal/cloudpickle_pickler_test.py#L54


I'd appreciate any feedback or concerns.


Best,

Claude

Reply via email to