[PR] feat: Add code mode (Monty sandbox) to common.ai AgentOperator [airflow]

via GitHub Thu, 11 Jun 2026 13:43:12 -0700


kaxil opened a new pull request, #68407:
URL: https://github.com/apache/airflow/pull/68407


   ## Summary
   
   Adds an opt-in ``code_mode=True`` flag to ``AgentOperator`` (and 
``@task.agent``) that wraps the agent's tools in a single ``run_code`` tool 
executed in pydantic-ai's [Monty](https://github.com/pydantic/monty) sandbox, 
via the new ``[code-mode]`` extra (``pydantic-ai-harness[codemode]``). For 
multi-tool workflows the model writes one Python snippet that calls the tools 
as functions -- with loops, conditionals, and ``asyncio.gather`` -- instead of 
one model round-trip per tool call, cutting round-trips and token use.
   
   ## Design rationale
   
   - **Why an extra, not a base dependency.** `pydantic-monty` is pre-1.0 and 
fast-moving (a Rust/native wheel). Pinning it as a hard dependency of every 
common.ai install would let its churn or any platform-wheel gap break the whole 
provider for users who never touch code mode. It is gated behind the 
`code-mode` extra and raises `AirflowOptionalProviderFeatureException` when 
used without it -- the same pattern the provider already uses for `mcp`, `sql`, 
and `skills`.
   - **Why a `bool` flag rather than passing the capability through 
`agent_params`.** Capability instances aren't round-trip-safe through DAG 
serialization (see the existing "Capabilities" docs note). `code_mode` is a 
plain boolean; the `CodeMode` capability is constructed at execution time in 
`_build_agent`, never stored on the serialized operator.
   - **The tool boundary is unchanged.** CodeMode collapses the tools you 
registered into one `run_code` tool; the generated code runs deny-by-default 
(no filesystem, network, or env access) and can only call those tools, which 
still execute in the worker. Code mode changes *how* the model invokes tools, 
not *what* it can reach.
   - **Toolset return schemas.** `HookToolset` and `SQLToolset` now set 
`return_schema` on their tool definitions so code mode renders `-> str` instead 
of `-> Any`. Both always return serialized strings (`_serialize_for_llm` / 
`json.dumps`), so `{"type": "string"}` is accurate. The kwarg is applied 
through a small version-guarded helper because `ToolDefinition.return_schema` 
is newer than the provider's pydantic-ai floor.
   
   ## Usage
   
   ```python
   AgentOperator(
       task_id="analyst",
       prompt="For the top 3 customers by order count, what was each one's 
total spend?",
       llm_conn_id="pydanticai_default",
       system_prompt="You are a SQL analyst. Write Python that calls the tools 
to answer.",
       toolsets=[SQLToolset(db_conn_id="postgres_default", 
allowed_tables=["customers", "orders"])],
       code_mode=True,  # pip install 
"apache-airflow-providers-common-ai[code-mode]"
   )
   ```
   
   ## Gotchas / limitations
   
   - Incompatible with `durable=True`: durable replay caches individual 
model/tool steps via a shared step counter that assumes a stable call order 
across runs, which code mode's free-form generated Python does not guarantee. 
The combination is rejected at construction (mirroring the existing `durable` + 
`enable_hitl_review` guard).
   - Monty supports a subset of Python and no third-party imports; it sandboxes 
the glue code between tool calls, not a general code runtime.
   - Draft: opened for early review. The real `CodeMode` round-trip is 
exercised via a local breeze spike (the harness isn't in CI), and the unit 
tests cover the provider-owned wiring (build-or-raise, capability injection) 
with the harness mocked.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] feat: Add code mode (Monty sandbox) to common.ai AgentOperator [airflow]

Reply via email to