tvalentyn commented on issue #22893: URL: https://github.com/apache/beam/issues/22893#issuecomment-1502354194
Thanks all, here is an update. - Dill has made several breaking changes between dill version 0.3.1.1 and dill version 0.3.6, which affect what gets pickled and how: https://github.com/uqfoundation/dill/issues?q=is%3Aissue+regression - Switching Apache Beam to newer version of dill will very likely negatively affect some group of users, while it may be a non-issue for some other group of users. - Dill has also made changes to internal code, which Beam had some assumptions about. Beam 2.47.0 will include code changes to be able to work with `dill==0.3.6`: https://github.com/apache/beam/pull/26086, however the default and required version of dill still remains `dill==0.3.1.1` for now. - `dill==0.3.1.1`. doesn't support Python 3.11. For Beam, one of the primary motivation to upgrade dill is to support Python 3.11, in addition to concerns in this bug. To unblock Python 3.11 support we went with monkey-patching dill 0.3.1.1 at runtime. The patch is applied only if dill version is 0.3.1.1 and Python version 3.11 or higher: [`3d0ee7b` (#26121)](https://github.com/apache/beam/pull/26121/commits/3d0ee7b4ccbebe6069e0dca81d1bfe46381d546f) . The alternative we have considered is to vendor dill. I decided against vendoring at this time at last minute because: dill makes changes to the global state, for example modifies global dispatch table used by standard pickler. Given the demand for this issue it is clear that newer versions of dill will be installed in addition to vendored version. The vendored version and a stock version installed at runtime may potentially modify the global state differently. I didn't have enough time before 2.47.0 release cut to properly evalu ate and address a risk of such concurrent modification. - I have also evaluated setting cloudpickle as default pickler, and have encountered one issue that warranted additional investigation: https://github.com/apache/beam/issues/26209 . - I plan to have a conversation with dill maintainers to see if we can mitigate the impact of the breaking changes they have introduced to be able to update smoothly or switch to a different default pickler. In the meantime, with Beam 2.47.0, users can try to update to newer version of dill, even though beam requires dill 0.3.1.1. Users can force-install newer version of dill in their submission environment ***as long as they install the same version of dill at runtime environment***. As I mentioned above, some users may not be affected by dill's breaking changes while some other users may be. Dill's breaking changes are not something Beam controls, but as mentioned above, Beam did make code changes to work with newer versions of dill. I will also continue to work on a better solution for this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
