mykola-shyshov commented on issue #57726:
URL: https://github.com/apache/airflow/issues/57726#issuecomment-3506714758
Hello. The issue is quite tricky. Due to the recent changes in #51059, all
Pydantic model serialization/deserialization is now handled entirely by
Pydantic.
The different cases are explained below:
# Serialization Flows Summary
## ObjectStoragePath with mode='python' ✅
**Serialization:**
1. Pydantic serializer calls `model.model_dump(mode='python')`
2. Returns: `{'name': 'test', 'path': ObjectStoragePath(...)}` ← Object
preserved
3. Airflow recursively serializes the dict
4. Finds ObjectStoragePath has `serialize()` method
5. Calls `ObjectStoragePath.serialize()`
6. Returns full type info: `{__classname__, __version__, __data__}`
**Deserialization:**
1. Sees `__classname__: ObjectStoragePath`
2. Calls `ObjectStoragePath.deserialize()`
3. Reconstructs ObjectStoragePath object ✅
---
## ObjectStoragePath with mode='json' ❌ (CURRENT - BROKEN)
**Serialization:**
1. Pydantic serializer calls `model.model_dump(mode='json')`
2. Pydantic converts ObjectStoragePath → string
3. Returns: `{'name': 'test', 'path': 's3://bucket/file.txt'}` ← Lost
object!
4. Airflow sees only a string, no type info saved
**Deserialization:**
1. Calls `PathModel.model_validate({'path': 's3://...'})`
2. Pydantic expects ObjectStoragePath but receives string
3. No validator to convert string → ObjectStoragePath
4. Field stays as string ❌
**Problem:** Pydantic strips type info in step 2, Airflow never sees the
object.
---
## AnyUrl with mode='python' ❌ (ISSUE #56736)
**Serialization:**
1. Pydantic serializer calls `model.model_dump(mode='python')`
2. Returns: `{'name': 'test', 'url': AnyUrl(...)}` ← Object preserved
3. Airflow recursively serializes the dict
4. Checks if AnyUrl has `serialize()` method → NO
5. Checks if AnyUrl is dataclass → NO
6. Checks if AnyUrl has attrs → NO
7. Raises error: "cannot serialize AnyUrl" ❌
**Problem:** No Airflow serializer registered for AnyUrl.
---
## AnyUrl with mode='json' ✅ (CURRENT - WORKS)
**Serialization:**
1. Pydantic serializer calls `model.model_dump(mode='json')`
2. Pydantic converts AnyUrl → string
3. Returns: `{'name': 'test', 'url': 'http://example.com/'}`
4. Airflow serializes as string
**Deserialization:**
1. Calls `UrlModel.model_validate({'url': 'http://example.com/'})`
2. Pydantic expects AnyUrl, receives string
3. Pydantic has built-in validator for AnyUrl
4. Converts string → AnyUrl ✅
**Works because:** Pydantic knows how to rebuild AnyUrl from string.
---
## Summary
| Type | Mode | Result | Why |
|-------------------|--------|---------|-----|
| ObjectStoragePath | python | ✅ | Has serialize() method, Airflow
handles it |
| ObjectStoragePath | json | ❌ | Pydantic converts to string, type
info lost |
| AnyUrl | python | ❌ | No Airflow serializer for AnyUrl |
| AnyUrl | json | ✅ | Pydantic converts to/from string
automatically |
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]