jrwalk opened a new issue #7870: migrate from `dill` to `cloudpickle` for advanced serialization URL: https://github.com/apache/airflow/issues/7870 **Description** Usage of `dill` for optional serialization in `PythonVirtualenvOperator` may be replaced with `cloudpickle` as its serialization library. This should be a mostly drop-in replacement. **Use case / motivation** Currently, the `PythonVirtualenvOperator` optionally uses `dill` in place of stock `pickle` to serialize advanced types. However, most major distributed compute frameworks have opted to shift to `cloudpickle`, meaning using `dill` for Airflow can introduce redundant dependencies for calling out to other distributed compute (e.g., farming compute-heavy tasks out to a remote `dask` cluster), and can interfere with serialization of tasks for those tools. Since both `dill` and `cloudpickle` are largely drop-in replacements for `pickle`, the migration should be fairly minor. **Related Issues** https://github.com/kubeflow/pipelines/issues/1387 https://github.com/dask/distributed/issues/3606 https://github.com/RaRe-Technologies/gensim/issues/558#issuecomment-217445542 https://github.com/uqfoundation/multiprocess/issues/22#issuecomment-243120410
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services