kaxil opened a new pull request, #67389:
URL: https://github.com/apache/airflow/pull/67389
## Summary
`@task.agent` and the four sibling LLM decorators (`@task.llm`,
`@task.llm_branch`, `@task.llm_schema_compare`, `@task.llm_sql`) currently
reject any non-string return value from the user's callable:
```python
if not isinstance(self.prompt, str) or not self.prompt.strip():
raise TypeError("...must be a non-empty string.")
```
But pydantic-ai's `Agent.run_sync` accepts `str | Sequence[UserContent]`,
and these operators pass `self.prompt` straight through. The string-only
constraint lives only in the decorator's `execute` -- there's no architectural
reason for it.
This PR widens the validation so the callable may return a `Sequence` of
pydantic-ai `UserContent` items (`TextContent`, `ImageUrl`, `AudioUrl`,
`DocumentUrl`, `VideoUrl`, `BinaryContent`, `UploadedFile`, `CachePoint`) in
addition to `str`. Vision, audio, and document inputs to pydantic-ai agents now
work through the TaskFlow decorator path without falling back to
`PydanticAIHook.create_agent()` inside a plain `@task`.
## Usage
```python
from pydantic_ai.messages import ImageUrl
from airflow.sdk import dag, task
@dag(...)
def vision_pipeline():
@task.agent(llm_conn_id="pydantic_ai_default", system_prompt="You are a
careful image analyst.")
def describe(image_url: str):
return ["Describe what you see in this image:",
ImageUrl(url=image_url)]
describe("https://example.com/sample.png")
```
## Design rationale
**Why decorator-only widening (operator `__init__` types unchanged):**
Direct operator instantiation (`AgentOperator(prompt=...)`) is supported but
uncommon -- the decorator path covers the primary use case. Widening the
operator `__init__` annotation would also tempt direct callers into shapes the
rendered-fields capture path doesn't handle well. Decorator-only widening is a
clean partial step; the operator `prompt: str` annotation stays, and
direct-multimodal callers fall back to the same hook-level pattern they had
before.
**Why three layers of guards** (decorator preflight → operator preflight →
mixin guard): each layer catches a different bypass scenario:
- Decorator preflight (`@task.agent` + `enable_hitl_review=True` +
Sequence): fails fast on the obvious case before render_template_fields runs.
- Operator preflight (`AgentOperator.execute` checking `self.prompt` after
task SDK has rendered templates): catches the *native template rendering*
bypass -- `prompt="{{ params.parts }}"` rendering into a Sequence at execute
time -- and direct-operator construction.
- `LLMApprovalMixin.defer_for_approval` guard: backstop in case any path
bypasses the operator-level check; also prevents raw bytes from a
`BinaryContent` from being interpolated into the human review body.
**Why HITL/approval are blocked rather than coerced**:
`AgentSessionData.prompt: str` and `SessionResponse.prompt: str` (plugin +
frontend) assume a string today. Silently stringifying a list into
`repr(['Describe:', ImageUrl(url='...')])` would expose object reprs (and
embedded bytes) in the review UI. Fail-loudly is the right v1 behaviour.
Widening the session model + review UI is tracked as a follow-up on the [AIP-99
board](https://github.com/orgs/apache/projects/586).
**Why `llm_file_analysis` keeps the string-only check**: that operator
builds `request.user_content` from `prompt + files` -- prompt is intentionally
a string description and files are supplied separately. Multimodal is already
supported there through the `files` kwarg. A one-line code comment documents
this.
## Gotchas / known limitations
- **HITL incompatibility**: `enable_hitl_review=True` + Sequence prompt
raises `TypeError` before the agent runs. Workaround: return a `str` prompt, or
disable HITL review. Follow-up: widen `AgentSessionData.prompt` and the HITL
review UI.
- **Approval incompatibility**: `require_approval=True` + Sequence prompt
raises `TypeError` before the agent runs (on `@task.llm` and `@task.llm_sql`;
the inherited approval path is a no-op on `@task.llm_branch` and
`@task.llm_schema_compare` -- pre-existing bug, separate follow-up).
- **Direct-operator type annotation drift**: `AgentOperator.__init__` still
types `prompt: str` even though the runtime accepts more for the decorator
path. mypy users instantiating the operator directly with a `Sequence` see the
type warning; supported usage remains through the decorator. Widening
direct-operator typing requires a safer rendered-fields representation for
non-str prompts, which is out of scope for this PR.
- **`Rendered Fields` UI**: for the decorator path, `self.prompt` is
`SET_DURING_EXECUTION` at the pre-execute render_fields capture, so the UI
shows `"DYNAMIC (set during execution)"` regardless of prompt shape. No bytes
leak.
## Follow-ups (tracked on AIP-99 board)
- Widen `AgentSessionData` / `SessionResponse` to support multimodal prompts
in HITL review.
- Fix pre-existing `require_approval=True` no-op on `LLMBranchOperator` /
`LLMSchemaCompareOperator`.
- Render multimodal prompts safely in `LLMApprovalMixin` review body (remove
the guard once safe).
---
##### Was generative AI tooling used to co-author this PR?
- [ ]
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]