kaxil commented on code in PR #64626:
URL: https://github.com/apache/airflow/pull/64626#discussion_r3028665314


##########
airflow-core/src/airflow/serialization/encoders.py:
##########
@@ -162,6 +162,14 @@ def _ensure_serialized(d):
     if isinstance(trigger, dict):
         classpath = trigger["classpath"]
         kwargs = trigger["kwargs"]
+        # unwrap any kwargs that are themselves serialized objects, to avoid 
double-serialization in the trigger's own serialize() method.
+        unwrapped = {}
+        for k, v in kwargs.items():
+            if isinstance(v, dict) and Encoding.TYPE in v:
+                unwrapped[k] = BaseSerialization.deserialize(v)

Review Comment:
   This unwrap step is a no-op. After unwrapping, `_ensure_serialized` on line 
169 re-serializes each value right back to the same form. Walk through it with 
a tuple kwarg `(1, 2)` stored as `{"__type": "tuple", "__var": [1, 2]}`: before 
the PR, `_ensure_serialized` sees `Encoding.TYPE` in the dict and returns it 
as-is; after the PR, unwrap deserializes to `(1, 2)`, then `_ensure_serialized` 
calls `BaseSerialization.serialize((1,2))` producing `{"__type": "tuple", 
"__var": [1, 2]}`. Same output either way.
   
   The root cause of the hash mismatch in `add_asset_trigger_references` is 
that `BaseEventTrigger.hash()` internally calls 
`BaseSerialization.serialize(kwargs)` again on kwargs that are already 
serialized by `encode_trigger`. The DB side doesn't double-wrap because 
`_decrypt_kwargs` returns raw Python objects. The fix needs to happen where the 
hash is computed (as PR #64625 does), not inside `encode_trigger`.



##########
airflow-core/src/airflow/serialization/encoders.py:
##########
@@ -162,6 +162,14 @@ def _ensure_serialized(d):
     if isinstance(trigger, dict):
         classpath = trigger["classpath"]
         kwargs = trigger["kwargs"]
+        # unwrap any kwargs that are themselves serialized objects, to avoid 
double-serialization in the trigger's own serialize() method.
+        unwrapped = {}
+        for k, v in kwargs.items():
+            if isinstance(v, dict) and Encoding.TYPE in v:
+                unwrapped[k] = BaseSerialization.deserialize(v)
+            else:
+                unwrapped[k] = v
+        kwargs = unwrapped
     else:
         classpath, kwargs = trigger.serialize()
     return {

Review Comment:
   No tests for this change. A serialization fix to the asset watcher trigger 
persistence path should include a test that verifies hash consistency between 
the DAG-parsed and DB-stored paths -- something like: create a trigger with 
non-primitive kwargs (tuple, set), round-trip through 
`encrypt_kwargs`/`_decrypt_kwargs`, and verify `BaseEventTrigger.hash` produces 
the same value for both the original and round-tripped kwargs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to