potiuk commented on issue #40974: URL: https://github.com/apache/airflow/issues/40974#issuecomment-2250048420
The problem as I understand it is tha we currently have two serializations: 1) https://github.com/apache/airflow/blob/main/airflow/serialization/serialized_objects.py -> this one uses "old" serialization where you have giant if/else clause where we have to manually add new type of objects to serialize 2) https://github.com/apache/airflow/blob/main/airflow/serialization/serde.py -> which is a pluggable and presumably way faster way of serializing. It uses "pluggable" https://github.com/apache/airflow/tree/main/airflow/serialization/serializers modules and I believe you can also add your own serializers and implement "serialize/deserialize" methods in your objects. It has no giant if, providers could potentnially register their own serializers etc. etc. Some of our code uses 1) - some uses 2). There are a few problems there - for example 2) does not yet support DAG serialization, and it also is not used for example in internal-api (but internal-api will be gone in Airlfow 3). There are also other places where we use other serialization mechanisms (dill, cloudpickle) and there is a potential tha we could consolidate that as well. Ideally (this was the long term vision of @bolkedebruin) we should have "one to rule them all" - 2) should become universal serializer used by everything. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
