Desdroid commented on issue #57726:
URL: https://github.com/apache/airflow/issues/57726#issuecomment-3531762027

   > I think we need to provide a way to handle this, either in the 
documentation or through a more “friendly” implementation. 
   
   Well, there is enough documentation on the pydantic side, but yes we might 
need to remind people. 
   
   > I agree that arbitrary_types_allowed is a workaround, but what should 
users do with third-party classes that aren’t Pydantic models? We can patch 
Airflow classes, but there will always be cases involving other third-party 
types.
   
   I might be wrong here, but pydantic is not a magic SerDe library for any 
type. It is clearly defined which types are supported in their docs. If you 
deliberately choose to use arbitrary types with pydantic it is your job to make 
sure SerDe is still possible. Airflow also doesn't serialize any third party 
type you throw at it - you will get a SerializationException. Same with 
pydantic. `ObjectStoragePath` could only be serialized by pydantic because it 
inherits from `pathlib.Path` - a type supported by pydantic (see 
https://docs.pydantic.dev/latest/concepts/serialization/#subclasses-of-supported-types).
   
   Now there are two options to add SerDe on your own for the arbitrary types 
you are using.
   
   1. In the pydantic world you would need to provide a 
[Serializer](https://docs.pydantic.dev/latest/concepts/serialization/#serializers)
 for your field and then for the Deserialization you'd use a 
[BeforeValidator](https://docs.pydantic.dev/latest/concepts/validators/#field-validators)
   Here is an example that i think should also work today with Airflow:
   ```python
   from airflow.sdk import ObjectStoragePath
   from pydantic import BaseModel, field_validator, field_serializer
   import json
   
   class Foo:
       val: str
   
       def __init__(self, val: str):
           self.val = val
   
       def some_method(self):
           return self.val
   
   
   class Dummy(BaseModel):
       model_config = {"arbitrary_types_allowed": True}
       arbitrary_class: Foo
       # No need for a custom serializer, ObjectStoragePath inherits from 
pathlib.Path
       some_arbitrary_airflow_class: ObjectStoragePath
   
       @field_serializer("arbitrary_class", mode="plain")
       def serialize_arbitrary_class(self, v: Foo) -> str:
           return v.val
   
       @field_validator('some_arbitrary_airflow_class', mode='before')
       @classmethod
       def validate_some_arbitrary_airflow_class(cls, v):
           if isinstance(v, str):
               return ObjectStoragePath(v)
           return v
   
       @field_validator('arbitrary_class', mode='before')
       @classmethod
       def validate_arbitrary_class(cls, v):
           if isinstance(v, str):
               return Foo(v)
           return v
   
   def test_serde():
       a = Dummy(arbitrary_class=Foo("s3://foo"), 
some_arbitrary_airflow_class=ObjectStoragePath("s3://bucket/path/to/file"))
   
       j = json.dumps(a.model_dump(mode="json"))
       b = Dummy.model_validate_json(j)
       assert type(a) == type(b)
   ```
   
   2. In airflow we allow to also provide custom SerDe logic via the 
`serialize()` and `deserialize(data, version: int)` methods. Unfortunately, 
until #58239 is implemented - if you use pydantic classes you can't take this 
route and need to use option 1, as we currently first use the existing built-in 
SerDe for pydantic which uses `model_dump(mode='json')`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to