kaxil commented on code in PR #64626:
URL: https://github.com/apache/airflow/pull/64626#discussion_r3029144026
##########
airflow-core/src/airflow/serialization/encoders.py:
##########
@@ -162,6 +162,14 @@ def _ensure_serialized(d):
if isinstance(trigger, dict):
classpath = trigger["classpath"]
kwargs = trigger["kwargs"]
+ # unwrap any kwargs that are themselves serialized objects, to avoid
double-serialization in the trigger's own serialize() method.
+ unwrapped = {}
+ for k, v in kwargs.items():
+ if isinstance(v, dict) and Encoding.TYPE in v:
+ unwrapped[k] = BaseSerialization.deserialize(v)
Review Comment:
Not a hallucination -- let me walk through it.
The unwrap is a no-op because `_ensure_serialized` re-wraps everything
immediately after. Trace a tuple kwarg through `encode_trigger` when the input
is a dict (the path this PR modifies):
**Before the PR:**
1. `kwargs["topics"]` = `{"__type": "tuple", "__var": [1, 2]}` (already
serialized)
2. `_ensure_serialized` sees `Encoding.TYPE` in dict, returns as-is
3. Output: `{"__type": "tuple", "__var": [1, 2]}`
**After the PR:**
1. Unwrap: `BaseSerialization.deserialize({"__type": "tuple", "__var": [1,
2]})` -> `(1, 2)`
2. `_ensure_serialized((1, 2))` -> `BaseSerialization.serialize((1, 2))` ->
`{"__type": "tuple", "__var": [1, 2]}`
3. Output: `{"__type": "tuple", "__var": [1, 2]}`
Same output. The unwrap+re-serialize is a round-trip identity.
The actual hash mismatch lives in `BaseEventTrigger.hash`
(`triggers/base.py:206`):
```python
hash((classpath,
json.dumps(BaseSerialization.serialize(kwargs)).encode("utf-8")))
```
This calls `BaseSerialization.serialize()` on kwargs whose values are
already wrapped dicts like `{"__type": "tuple", "__var": [...]}`.
`BaseSerialization.serialize` treats any `dict` as `DAT.DICT`, so it
double-wraps:
- **DAG side**: `serialize({"__type": "tuple", "__var": [1, 2]})` produces
`{"__type": "dict", "__var": {"__type": "tuple", "__var": [1, 2]}}`
(double-wrapped as "dict")
- **DB side**: `Trigger.kwargs` returns raw Python via `serde.deserialize`
(its `_convert` function recognizes old-format `__type`/`__var` and
reconstructs the real object). So `serialize(())` produces `{"__type": "tuple",
"__var": []}` (single-wrapped, correctly typed)
Different JSON, different hash, phantom trigger recreation every heartbeat.
The tests in this PR also don't cover the gap:
`test_hash_matches_after_db_round_trip` creates triggers from
`BaseEventTrigger` objects (not dicts), so the new unwrap code never runs. And
`test_add_asset_trigger_references_hash_consistency` uses `FileDeleteTrigger`
with only primitive kwargs, so there's nothing to double-wrap.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]