moiseenkov commented on code in PR #38531:
URL: https://github.com/apache/airflow/pull/38531#discussion_r1559365639
##########
airflow/models/taskinstance.py:
##########
@@ -1287,7 +1287,7 @@ class TaskInstance(Base, LoggingMixin):
queued_dttm = Column(UtcDateTime)
queued_by_job_id = Column(Integer)
pid = Column(Integer)
- executor_config = Column(ExecutorConfigType(pickler=dill))
+ executor_config = Column(ExecutorConfigType(pickler=cloudpickle))
Review Comment:
@potiuk ,
If we can pre-install both `dill` and `cloudpickle` packages, I think we
could implement an optional feature as follows:
1. Specify option for the package somewhere and describe it in the
documentation. Default value is `cloudpickle`.
2. Add migration that reformats models (`TaskInstance`, `DagPickle`) so they
would also store information about the package used for serialization (in our
case - dill==3.1.1.1). For instance, the new format could be:
```python
{
"pickler": {"name": "dill", "version": "3.1.1"},
"object": <object>
}
```
After the migration is complete, we always know which package serialized
the value. This information will help us perform the migration rollback
properly, and reformatting existing objects whenever users switch from one
package to another.
3. Integrate into the Airflow startup process a check for the serialization
package selected, and run the script that migrates saved objects from one
pickler package to another.
4. Adjust the `ExecutorConfigType` in order to support the new storage
format.
WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]