Desdroid commented on issue #57726: URL: https://github.com/apache/airflow/issues/57726#issuecomment-3531762027
> I think we need to provide a way to handle this, either in the documentation or through a more “friendly” implementation. Well, there is enough documentation on the pydantic side, but yes we might need to remind people. > I agree that arbitrary_types_allowed is a workaround, but what should users do with third-party classes that aren’t Pydantic models? We can patch Airflow classes, but there will always be cases involving other third-party types. I might be wrong here, but pydantic is not a magic SerDe library for any type. It is clearly defined which types are supported in their docs. If you deliberately choose to use arbitrary types with pydantic it is your job to make sure SerDe is still possible. Airflow also doesn't serialize any third party type you throw at it - you will get a SerializationException. Same with pydantic. `ObjectStoragePath` could only be serialized by pydantic because it inherits from `pathlib.Path` - a type supported by pydantic (see https://docs.pydantic.dev/latest/concepts/serialization/#subclasses-of-supported-types). Now there are two options to add SerDe on your own for the arbitrary types you are using. 1. In the pydantic world you would need to provide a [Serializer](https://docs.pydantic.dev/latest/concepts/serialization/#serializers) for your field and then for the Deserialization you'd use a [BeforeValidator](https://docs.pydantic.dev/latest/concepts/validators/#field-validators) Here is an example that i think should also work today with Airflow: ```python from airflow.sdk import ObjectStoragePath from pydantic import BaseModel, field_validator, field_serializer import json class Foo: val: str def __init__(self, val: str): self.val = val def some_method(self): return self.val class Dummy(BaseModel): model_config = {"arbitrary_types_allowed": True} arbitrary_class: Foo # No need for a custom serializer, ObjectStoragePath inherits from pathlib.Path some_arbitrary_airflow_class: ObjectStoragePath @field_serializer("arbitrary_class", mode="plain") def serialize_arbitrary_class(self, v: Foo) -> str: return v.val @field_validator('some_arbitrary_airflow_class', mode='before') @classmethod def validate_some_arbitrary_airflow_class(cls, v): if isinstance(v, str): return ObjectStoragePath(v) return v @field_validator('arbitrary_class', mode='before') @classmethod def validate_arbitrary_class(cls, v): if isinstance(v, str): return Foo(v) return v def test_serde(): a = Dummy(arbitrary_class=Foo("s3://foo"), some_arbitrary_airflow_class=ObjectStoragePath("s3://bucket/path/to/file")) j = json.dumps(a.model_dump(mode="json")) b = Dummy.model_validate_json(j) assert type(a) == type(b) ``` 2. In airflow we allow to also provide custom SerDe logic via the `serialize()` and `deserialize(data, version: int)` methods. Unfortunately, until #58239 is implemented - if you use pydantic classes you can't take this route and need to use option 1, as we currently first use the existing built-in SerDe for pydantic which uses `model_dump(mode='json')` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
