moiseenkov commented on code in PR #38531:
URL: https://github.com/apache/airflow/pull/38531#discussion_r1562712418


##########
airflow/models/taskinstance.py:
##########
@@ -1287,7 +1287,7 @@ class TaskInstance(Base, LoggingMixin):
     queued_dttm = Column(UtcDateTime)
     queued_by_job_id = Column(Integer)
     pid = Column(Integer)
-    executor_config = Column(ExecutorConfigType(pickler=dill))
+    executor_config = Column(ExecutorConfigType(pickler=cloudpickle))

Review Comment:
   @potiuk , hi.
   I looked into the [serde 
serialization](https://github.com/apache/airflow/blob/main/airflow/serialization/serde.py#L87),
 and if it was a right place to look at (please correct me if it wasn't), then 
I'm afraid its not a robust option. Unlike `dill` and `cloudpickle`, serde's 
`serialize()` method loses an object's type information similar to what 
`json.dumps()` does:
   ```python
   from airflow.serialization.serde import deserialize, serialize
   
   data = {1: 1}
   serialized = serialize(data)  # {'1': 1}
   deserialized = deserialize(serialized)  # {'1': 1}
   data == deserialized  # False
   ```
   Because we can't be sure that there are no more corner cases left, I think 
it's better to rely on some package supported by a dedicated team 
(`cloudpickle`). However, if I'm overcautious, please tell me that this isn't a 
critical issue for `TaskInstance.executor_config` and `DagPickle.pickle`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to