kaxil opened a new pull request, #68407: URL: https://github.com/apache/airflow/pull/68407
## Summary Adds an opt-in ``code_mode=True`` flag to ``AgentOperator`` (and ``@task.agent``) that wraps the agent's tools in a single ``run_code`` tool executed in pydantic-ai's [Monty](https://github.com/pydantic/monty) sandbox, via the new ``[code-mode]`` extra (``pydantic-ai-harness[codemode]``). For multi-tool workflows the model writes one Python snippet that calls the tools as functions -- with loops, conditionals, and ``asyncio.gather`` -- instead of one model round-trip per tool call, cutting round-trips and token use. ## Design rationale - **Why an extra, not a base dependency.** `pydantic-monty` is pre-1.0 and fast-moving (a Rust/native wheel). Pinning it as a hard dependency of every common.ai install would let its churn or any platform-wheel gap break the whole provider for users who never touch code mode. It is gated behind the `code-mode` extra and raises `AirflowOptionalProviderFeatureException` when used without it -- the same pattern the provider already uses for `mcp`, `sql`, and `skills`. - **Why a `bool` flag rather than passing the capability through `agent_params`.** Capability instances aren't round-trip-safe through DAG serialization (see the existing "Capabilities" docs note). `code_mode` is a plain boolean; the `CodeMode` capability is constructed at execution time in `_build_agent`, never stored on the serialized operator. - **The tool boundary is unchanged.** CodeMode collapses the tools you registered into one `run_code` tool; the generated code runs deny-by-default (no filesystem, network, or env access) and can only call those tools, which still execute in the worker. Code mode changes *how* the model invokes tools, not *what* it can reach. - **Toolset return schemas.** `HookToolset` and `SQLToolset` now set `return_schema` on their tool definitions so code mode renders `-> str` instead of `-> Any`. Both always return serialized strings (`_serialize_for_llm` / `json.dumps`), so `{"type": "string"}` is accurate. The kwarg is applied through a small version-guarded helper because `ToolDefinition.return_schema` is newer than the provider's pydantic-ai floor. ## Usage ```python AgentOperator( task_id="analyst", prompt="For the top 3 customers by order count, what was each one's total spend?", llm_conn_id="pydanticai_default", system_prompt="You are a SQL analyst. Write Python that calls the tools to answer.", toolsets=[SQLToolset(db_conn_id="postgres_default", allowed_tables=["customers", "orders"])], code_mode=True, # pip install "apache-airflow-providers-common-ai[code-mode]" ) ``` ## Gotchas / limitations - Incompatible with `durable=True`: durable replay caches individual model/tool steps via a shared step counter that assumes a stable call order across runs, which code mode's free-form generated Python does not guarantee. The combination is rejected at construction (mirroring the existing `durable` + `enable_hitl_review` guard). - Monty supports a subset of Python and no third-party imports; it sandboxes the glue code between tool calls, not a general code runtime. - Draft: opened for early review. The real `CodeMode` round-trip is exercised via a local breeze spike (the harness isn't in CI), and the unit tests cover the provider-owned wiring (build-or-raise, capability injection) with the harness mocked. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
