moiseenkov commented on code in PR #38531:
URL: https://github.com/apache/airflow/pull/38531#discussion_r1562712418
##########
airflow/models/taskinstance.py:
##########
@@ -1287,7 +1287,7 @@ class TaskInstance(Base, LoggingMixin):
queued_dttm = Column(UtcDateTime)
queued_by_job_id = Column(Integer)
pid = Column(Integer)
- executor_config = Column(ExecutorConfigType(pickler=dill))
+ executor_config = Column(ExecutorConfigType(pickler=cloudpickle))
Review Comment:
@potiuk , hi.
I looked into the [serde
serialization](https://github.com/apache/airflow/blob/main/airflow/serialization/serde.py#L87),
and if it was a right place to look at (please correct me if it wasn't), then
I'm afraid its not a robust option. Unlike `dill` and `cloudpickle`, serde's
`serialize()` method loses an object's type information similar to what
`json.dumps()` does:
```python
from airflow.serialization.serde import deserialize, serialize
data = {1: 1}
serialized = serialize(data) # {'1': 1}
deserialized = deserialize(serialized) # {'1': 1}
data == deserialized # False
```
Because we can't be sure that there are no more corner cases left, I think
it's better to rely on some package supported by a dedicated team
(`cloudpickle`). However, if I'm overcautious, please tell me that this isn't a
critical issue for `TaskInstance.executor_config` and `DagPickle.pickle`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]