moiseenkov commented on code in PR #38531:
URL: https://github.com/apache/airflow/pull/38531#discussion_r1559365639


##########
airflow/models/taskinstance.py:
##########
@@ -1287,7 +1287,7 @@ class TaskInstance(Base, LoggingMixin):
     queued_dttm = Column(UtcDateTime)
     queued_by_job_id = Column(Integer)
     pid = Column(Integer)
-    executor_config = Column(ExecutorConfigType(pickler=dill))
+    executor_config = Column(ExecutorConfigType(pickler=cloudpickle))

Review Comment:
   @potiuk ,
   If we can pre-install both `dill` and `cloudpickle` packages, I think we 
could implement an optional feature as follows:
   
   1. Specify option for the package somewhere and describe it in the 
documentation. Default value is `cloudpickle`.
   2. Add migration that reformats models (`TaskInstance`, `DagPickle`) so they 
would also store information about the package used for serialization (in our 
case - dill==3.1.1.1). For instance, the new format could be:
   ```python
   {
     "pickler": {"name": "dill", "version": "3.1.1"},
     "object": <object>
   }
    ```
     After the migration is complete, we always know which package serialized 
the value. This information will help us perform the migration rollback 
properly, and reformatting existing objects whenever users switch from one 
package to another. 
   
   3. Integrate into the Airflow startup process a check for the serialization 
package selected, and run the script that migrates saved objects from one 
pickler package to another.
   4. Adjust the `ExecutorConfigType` in order to support the new storage 
format.
   
   WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to