kaxil opened a new pull request, #67389:
URL: https://github.com/apache/airflow/pull/67389

   ## Summary
   
   `@task.agent` and the four sibling LLM decorators (`@task.llm`, 
`@task.llm_branch`, `@task.llm_schema_compare`, `@task.llm_sql`) currently 
reject any non-string return value from the user's callable:
   
   ```python
   if not isinstance(self.prompt, str) or not self.prompt.strip():
       raise TypeError("...must be a non-empty string.")
   ```
   
   But pydantic-ai's `Agent.run_sync` accepts `str | Sequence[UserContent]`, 
and these operators pass `self.prompt` straight through. The string-only 
constraint lives only in the decorator's `execute` -- there's no architectural 
reason for it.
   
   This PR widens the validation so the callable may return a `Sequence` of 
pydantic-ai `UserContent` items (`TextContent`, `ImageUrl`, `AudioUrl`, 
`DocumentUrl`, `VideoUrl`, `BinaryContent`, `UploadedFile`, `CachePoint`) in 
addition to `str`. Vision, audio, and document inputs to pydantic-ai agents now 
work through the TaskFlow decorator path without falling back to 
`PydanticAIHook.create_agent()` inside a plain `@task`.
   
   ## Usage
   
   ```python
   from pydantic_ai.messages import ImageUrl
   from airflow.sdk import dag, task
   
   @dag(...)
   def vision_pipeline():
       @task.agent(llm_conn_id="pydantic_ai_default", system_prompt="You are a 
careful image analyst.")
       def describe(image_url: str):
           return ["Describe what you see in this image:", 
ImageUrl(url=image_url)]
   
       describe("https://example.com/sample.png";)
   ```
   
   ## Design rationale
   
   **Why decorator-only widening (operator `__init__` types unchanged):** 
Direct operator instantiation (`AgentOperator(prompt=...)`) is supported but 
uncommon -- the decorator path covers the primary use case. Widening the 
operator `__init__` annotation would also tempt direct callers into shapes the 
rendered-fields capture path doesn't handle well. Decorator-only widening is a 
clean partial step; the operator `prompt: str` annotation stays, and 
direct-multimodal callers fall back to the same hook-level pattern they had 
before.
   
   **Why three layers of guards** (decorator preflight → operator preflight → 
mixin guard): each layer catches a different bypass scenario:
   - Decorator preflight (`@task.agent` + `enable_hitl_review=True` + 
Sequence): fails fast on the obvious case before render_template_fields runs.
   - Operator preflight (`AgentOperator.execute` checking `self.prompt` after 
task SDK has rendered templates): catches the *native template rendering* 
bypass -- `prompt="{{ params.parts }}"` rendering into a Sequence at execute 
time -- and direct-operator construction.
   - `LLMApprovalMixin.defer_for_approval` guard: backstop in case any path 
bypasses the operator-level check; also prevents raw bytes from a 
`BinaryContent` from being interpolated into the human review body.
   
   **Why HITL/approval are blocked rather than coerced**: 
`AgentSessionData.prompt: str` and `SessionResponse.prompt: str` (plugin + 
frontend) assume a string today. Silently stringifying a list into 
`repr(['Describe:', ImageUrl(url='...')])` would expose object reprs (and 
embedded bytes) in the review UI. Fail-loudly is the right v1 behaviour. 
Widening the session model + review UI is tracked as a follow-up on the [AIP-99 
board](https://github.com/orgs/apache/projects/586).
   
   **Why `llm_file_analysis` keeps the string-only check**: that operator 
builds `request.user_content` from `prompt + files` -- prompt is intentionally 
a string description and files are supplied separately. Multimodal is already 
supported there through the `files` kwarg. A one-line code comment documents 
this.
   
   ## Gotchas / known limitations
   
   - **HITL incompatibility**: `enable_hitl_review=True` + Sequence prompt 
raises `TypeError` before the agent runs. Workaround: return a `str` prompt, or 
disable HITL review. Follow-up: widen `AgentSessionData.prompt` and the HITL 
review UI.
   - **Approval incompatibility**: `require_approval=True` + Sequence prompt 
raises `TypeError` before the agent runs (on `@task.llm` and `@task.llm_sql`; 
the inherited approval path is a no-op on `@task.llm_branch` and 
`@task.llm_schema_compare` -- pre-existing bug, separate follow-up).
   - **Direct-operator type annotation drift**: `AgentOperator.__init__` still 
types `prompt: str` even though the runtime accepts more for the decorator 
path. mypy users instantiating the operator directly with a `Sequence` see the 
type warning; supported usage remains through the decorator. Widening 
direct-operator typing requires a safer rendered-fields representation for 
non-str prompts, which is out of scope for this PR.
   - **`Rendered Fields` UI**: for the decorator path, `self.prompt` is 
`SET_DURING_EXECUTION` at the pre-execute render_fields capture, so the UI 
shows `"DYNAMIC (set during execution)"` regardless of prompt shape. No bytes 
leak.
   
   ## Follow-ups (tracked on AIP-99 board)
   
   - Widen `AgentSessionData` / `SessionResponse` to support multimodal prompts 
in HITL review.
   - Fix pre-existing `require_approval=True` no-op on `LLMBranchOperator` / 
`LLMSchemaCompareOperator`.
   - Render multimodal prompts safely in `LLMApprovalMixin` review body (remove 
the guard once safe).
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [ ] 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to