kaxil commented on code in PR #67418:
URL: https://github.com/apache/airflow/pull/67418#discussion_r3295506167


##########
shared/state/src/airflow_shared/state/__init__.py:
##########
@@ -203,7 +205,7 @@ def cleanup(self) -> None:
         ``[state_store] default_retention_days``) and deciding what to delete.
         """
 
-    def serialize_task_state_to_ref(self, *, value: str, key: str, ti_id: str) 
-> str:
+    def serialize_task_state_to_ref(self, *, value: JsonValue, key: str, 
ti_id: str) -> str:

Review Comment:
   Follow-up on the JSON-encoding change: the abstract docstrings here (and on 
`deserialize_task_state_from_ref` / `serialize_asset_state_to_ref` / 
`deserialize_asset_state_from_ref` below) still describe `value` / the return 
as an opaque string with no mention of the encoding contract.
   
   The Execution API routes now `json.dumps(body.value)` into the result of 
`set()` and `json.loads()` whatever `get()` returns. So any custom 
`BaseStateBackend` that overrides these methods MUST:
   
   - Have `serialize_*_to_ref` return a string that's a valid JSON document (so 
the route's `json.dumps` produces a doubly-quoted value safe for storage)
   - Have `deserialize_*_from_ref` return something `json.loads`-friendly, 
since the route deserializes again on the GET side
   
   If a third-party backend stores values verbatim from a non-JSON source (e.g. 
a pre-existing rows table where `value` is plain text), the GET route will 
raise on `json.loads` and surface as a 500. Worth one sentence on each abstract 
method pinning the contract down, e.g.: "The returned string MUST be a 
JSON-encoded value; the Execution API calls `json.loads` on what `get()` 
returns."



##########
airflow-core/tests/unit/api_fastapi/execution_api/versions/head/test_task_state.py:
##########
@@ -155,6 +186,16 @@ def test_put_null_value_returns_422(self, client: 
TestClient, create_task_instan
 
         assert response.status_code == 422
 
+    def test_put_nan_returns_422(self, client: TestClient, 
create_task_instance: CreateTaskInstance):
+        ti = create_task_instance()
+        # send NaN as raw JSON (json module rejects it but try sending with a 
allow_nan=True to bypass validation)
+        response = client.put(
+            _api_url(ti.id, "job_id"),
+            content=json.dumps({"value": float("nan")}, 
allow_nan=True).encode(),

Review Comment:
   Nit: the validator now rejects `not math.isfinite(v)` (i.e. NaN AND +/-Inf), 
but this test only exercises NaN. A regression that swapped `isfinite` for 
`isnan` would silently let Inf through and this test would still pass.
   
   Worth parametrizing over `float("nan")`, `float("inf")`, `float("-inf")` 
here (and in `test_asset_state.py:173`) so the Inf branch is locked in too. 
One-liner with `@pytest.mark.parametrize`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to