kaxil commented on code in PR #64626:
URL: https://github.com/apache/airflow/pull/64626#discussion_r3029912152
##########
airflow-core/src/airflow/serialization/encoders.py:
##########
@@ -162,6 +162,14 @@ def _ensure_serialized(d):
if isinstance(trigger, dict):
classpath = trigger["classpath"]
kwargs = trigger["kwargs"]
+ # unwrap any kwargs that are themselves serialized objects, to avoid
double-serialization in the trigger's own serialize() method.
+ unwrapped = {}
+ for k, v in kwargs.items():
+ if isinstance(v, dict) and Encoding.TYPE in v:
+ unwrapped[k] = BaseSerialization.deserialize(v)
Review Comment:
I was wrong here, apologies @uranusjr. I verified with a unit test and the
unwrap is NOT a no-op.
The subtlety I missed: `_ensure_serialized`'s short-circuit returns the dict
with **string** keys (`"__type"`, `"__var"` from JSON deserialization), but the
unwrap path goes through `BaseSerialization.deserialize` then
`BaseSerialization.serialize` which re-creates the dict with **Encoding enum**
keys via `_encode`. While `json.dumps` treats both identically (Encoding is a
`str` subclass), `BaseSerialization.serialize` inside `BaseEventTrigger.hash()`
uses `str(k)` on dict keys, which produces different results:
- `str("__type")` -> `"__type"` (string key preserved)
- `str(Encoding.TYPE)` -> `"Encoding.TYPE"` (enum key mangled)
The DB side also gets the mangled form (via `serde` round-trip), so both
sides produce `"Encoding.TYPE"` and the hashes match.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]