kaxil opened a new pull request, #62904: URL: https://github.com/apache/airflow/pull/62904
(Part of https://github.com/orgs/apache/projects/586) Adds MCP (Model Context Protocol) server support to the Common AI provider (AIP-99 Phase 5). Users can now connect AI agents to MCP servers — the open protocol that lets LLMs interact with external tools through a standardized interface. Two new components: - **`MCPToolset`** — resolves MCP server config from an Airflow connection and delegates to PydanticAI's MCP server classes. Stores URLs, auth tokens, and commands in Airflow connections/secret backends instead of hardcoding in DAG code. - **`MCPHook`** — dedicated `mcp` connection type with UI fields for transport (HTTP/SSE/stdio), command, args, and auth token. ## Design decisions **Three tiers of toolset usage** — the docs and examples make clear that: 1. `MCPToolset` (recommended) — Airflow connection management, secret backends, connection UI 2. Direct PydanticAI MCP servers (`MCPServerStreamableHTTP`, `MCPServerStdio`) — for prototyping or full control 3. Any `AbstractToolset` — AgentOperator accepts any PydanticAI-compatible toolset, no lock-in **Thin delegation, not reimplementation** — `MCPToolset` wraps PydanticAI's MCP servers and delegates `get_tools()`, `call_tool()`, `__aenter__`/`__aexit__`. The lifecycle delegation keeps the MCP connection open across tool calls in a multi-turn agent conversation instead of reconnecting per call. **Auth via Bearer header** — the connection's password field is passed as `Authorization: Bearer <token>` to HTTP/SSE servers. Stdio transport doesn't use auth (subprocess). **`args` coercion** — if a user enters the `args` extra field as a bare string instead of a JSON array, it's treated as a single-element list rather than splitting each character. ## Usage ```python from airflow.providers.common.ai.operators.agent import AgentOperator from airflow.providers.common.ai.toolsets.mcp import MCPToolset AgentOperator( task_id="mcp_agent", prompt="What tools are available?", llm_conn_id="pydantic_ai_default", toolsets=[ MCPToolset(mcp_conn_id="my_mcp_server"), MCPToolset(mcp_conn_id="code_runner", tool_prefix="code"), ], ) ``` Connection config (HTTP): ```json {"conn_type": "mcp", "host": "http://localhost:3001/mcp"} ``` Connection config (stdio): ```json {"conn_type": "mcp", "extra": "{\"transport\": \"stdio\", \"command\": \"uvx\", \"args\": [\"mcp-run-python\"]}"} ``` ## What's not included - **No MCP resource/sampling/elicitation** — just tool exposure. Can add later. - **No MCP server management** — Airflow doesn't start/stop MCP servers. HTTP servers run externally; stdio servers are spawned by PydanticAI as subprocesses. Requires the `mcp` optional extra: `pip install "apache-airflow-providers-common-ai[mcp]"` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
