kaxil opened a new pull request, #67932:
URL: https://github.com/apache/airflow/pull/67932

   The worker-side walk that registers an operator's structured-output classes 
for XCom deserialization (`_register_deserialization_allowed_classes`, reading 
each operator's `output_type`) only registers the top-level declared type. 
`iter_pydantic_models` walks the *annotation shape* (`Optional` / `Union` / 
`list[...]`) but never recurses into a model's own fields, so a model nested 
inside the declared type is never added to the per-process deserialization 
allow-list.
   
   With a model that nests another model:
   
   ```python
   class SubQuestion(BaseModel): ...
   class DecomposedQuestion(BaseModel):
       sub_questions: list[SubQuestion]
   
   @task.llm(output_type=DecomposedQuestion)
   def decompose(...): ...
   ```
   
   a downstream task that emits the nested model to XCom (`return 
decomposed.sub_questions` -> `list[SubQuestion]`) fails when its input is 
resolved on the consumer:
   
   ```
   ImportError: ...SubQuestion was not found in allow list for deserialization 
imports.
   ```
   
   `DecomposedQuestion` is registered (it is the declared `output_type`), but 
`SubQuestion`, reachable only through its field, is not.
   
   ## Fix
   
   After yielding a model, push its field annotations onto the walk stack so 
every reachable model is yielded and registered. The existing `seen` set makes 
self-referential and mutually recursive model graphs terminate.
   
   Behavior on the example above, exercising the real walk:
   
   - before: allow-list = `{DecomposedQuestion}`; deserializing 
`list[SubQuestion]` raises the ImportError
   - after: allow-list = `{DecomposedQuestion, SubQuestion}`; both hops 
deserialize
   
   New unit tests cover field recursion (including container-typed fields) and 
self-reference termination; the rest of the serde suite passes unchanged.
   
   ## Context
   
   Surfaced while fixing the common.ai 10-K example DAGs (#67930). Those 
examples now side-step it by pushing dicts (`serialize_output=True`), but the 
underlying gap affects any DAG that passes a nested Pydantic model between 
tasks, so it is worth fixing in core.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to