amoghrajesh commented on code in PR #58992:
URL: https://github.com/apache/airflow/pull/58992#discussion_r2598055295
##########
airflow-core/src/airflow/api_fastapi/core_api/routes/public/xcom.py:
##########
@@ -101,16 +101,25 @@ def get_xcom_entry(
item = copy.copy(result)
if deserialize:
- # We use `airflow.serialization.serde` for deserialization here
because custom XCom backends (with their own
- # serializers/deserializers) are only used on the worker side during
task execution.
-
- # However, the XCom value is *always* stored in the metadata database
as a valid JSON object.
- # Therefore, for purposes such as UI display or returning API
responses, deserializing with
- # `airflow.serialization.serde` is safe and recommended.
- from airflow.serialization.serde import deserialize as
serde_deserialize
-
- # full=False ensures that the `item` is deserialized without loading
the classes, and it returns a stringified version
- item.value = serde_deserialize(XComModel.deserialize_value(item),
full=False)
+ # Custom XCom backends may store references (eg: object storage paths)
in the database.
+ # The custom XCom backend's deserialize_value() resolves these to
actual values, but that is only
+ # used on workers during task execution. The API reads directly from
the database and uses
+ # stringify() to convert DB values (references or serialized data) to
human readable
+ # format for UI display or for API users.
+ import json
+
+ from airflow.serialization.stringify import stringify as stringify_xcom
+
+ try:
+ parsed_value = json.loads(result.value)
+ except (ValueError, TypeError):
+ # Already deserialized (e.g., set via Task Execution API)
+ parsed_value = result.value
Review Comment:
I thought about it a bit more and task execution path cannot store a bad
value in the database because it will go through the serde filter before hand
and anything wrong will be caught early. So we can be sure that the sdk path
will always be reliable jsonable and would enter the "except" path:
```python
(airflow) ➜ airflow git:(move-serde-to-task-sdk) ✗ python
Python 3.13.3 (main, Apr 8 2025, 13:54:08) [Clang 17.0.0
(clang-1700.0.13.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>>
>>> import json
>>> json.loads({1: 2})
Traceback (most recent call last):
File "<python-input-3>", line 1, in <module>
json.loads({1: 2})
~~~~~~~~~~^^^^^^^^
File
"/opt/homebrew/Cellar/[email protected]/3.13.3_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/json/__init__.py",
line 339, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
f'not {s.__class__.__name__}')
TypeError: the JSON object must be str, bytes or bytearray, not dict
>>> json.loads([1,2])
Traceback (most recent call last):
File "<python-input-4>", line 1, in <module>
json.loads([1,2])
~~~~~~~~~~^^^^^^^
File
"/opt/homebrew/Cellar/[email protected]/3.13.3_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/json/__init__.py",
line 339, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
f'not {s.__class__.__name__}')
TypeError: the JSON object must be str, bytes or bytearray, not list
>>> json.loads("abcd")
Traceback (most recent call last):
File "<python-input-5>", line 1, in <module>
json.loads("abcd")
~~~~~~~~~~^^^^^^^^
File
"/opt/homebrew/Cellar/[email protected]/3.13.3_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/json/__init__.py",
line 346, in loads
return _default_decoder.decode(s)
~~~~~~~~~~~~~~~~~~~~~~~^^^
File
"/opt/homebrew/Cellar/[email protected]/3.13.3_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/json/decoder.py",
line 345, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
File
"/opt/homebrew/Cellar/[email protected]/3.13.3_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/json/decoder.py",
line 363, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
>>> json.loads("{invalid json")
Traceback (most recent call last):
File "<python-input-6>", line 1, in <module>
json.loads("{invalid json")
~~~~~~~~~~^^^^^^^^^^^^^^^^^
File
"/opt/homebrew/Cellar/[email protected]/3.13.3_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/json/__init__.py",
line 346, in loads
return _default_decoder.decode(s)
~~~~~~~~~~~~~~~~~~~~~~~^^^
File
"/opt/homebrew/Cellar/[email protected]/3.13.3_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/json/decoder.py",
line 345, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
File
"/opt/homebrew/Cellar/[email protected]/3.13.3_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/json/decoder.py",
line 361, in raw_decode
obj, end = self.scan_once(s, idx)
~~~~~~~~~~~~~~^^^^^^^^
json.decoder.JSONDecodeError: Expecting property name enclosed in double
quotes: line 1 column 2 (char 1)
```
And the Core API never stores deserialized Python objects, so it will always
enter the "try" part and almost never fail to deserialize unless someone has
stored directly in the DB
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]