jrwalk opened a new issue #7870: migrate from `dill` to `cloudpickle` for 
advanced serialization
URL: https://github.com/apache/airflow/issues/7870
 
 
   **Description**
   
   Usage of `dill` for optional serialization in `PythonVirtualenvOperator` may 
be replaced with `cloudpickle` as its serialization library.  This should be a 
mostly drop-in replacement.
   
   **Use case / motivation**
   
   Currently, the `PythonVirtualenvOperator` optionally uses `dill` in place of 
stock `pickle` to serialize advanced types.  However, most major distributed 
compute frameworks have opted to shift to `cloudpickle`, meaning using `dill` 
for Airflow can introduce redundant dependencies for calling out to other 
distributed compute (e.g., farming compute-heavy tasks out to a remote `dask` 
cluster), and can interfere with serialization of tasks for those tools.
   
   Since both `dill` and `cloudpickle` are largely drop-in replacements for 
`pickle`, the migration should be fairly minor.
   
   **Related Issues**
   
   https://github.com/kubeflow/pipelines/issues/1387
   
   https://github.com/dask/distributed/issues/3606
   
   https://github.com/RaRe-Technologies/gensim/issues/558#issuecomment-217445542
   
   https://github.com/uqfoundation/multiprocess/issues/22#issuecomment-243120410

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to