This is an automated email from the ASF dual-hosted git repository.
kaxilnaik pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git
The following commit(s) were added to refs/heads/main by this push:
new be73e4f171a Add AGENTS.md to common AI provider (#62824)
be73e4f171a is described below
commit be73e4f171a6bed7810c0ae1860281405db8a4dc
Author: Kaxil Naik <[email protected]>
AuthorDate: Tue Mar 3 22:08:17 2026 +0000
Add AGENTS.md to common AI provider (#62824)
---
providers/common/ai/AGENTS.md | 60 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 60 insertions(+)
diff --git a/providers/common/ai/AGENTS.md b/providers/common/ai/AGENTS.md
new file mode 100644
index 00000000000..2c068d70891
--- /dev/null
+++ b/providers/common/ai/AGENTS.md
@@ -0,0 +1,60 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# Common AI Provider — Agent Instructions
+
+This provider wraps [pydantic-ai](https://ai.pydantic.dev/) to connect Airflow
pipelines to LLMs.
+The hook is a thin bridge between Airflow connections and pydantic-ai's
model/provider abstractions.
+
+## Design Principles
+
+- **Delegate to pydantic-ai.** pydantic-ai supports 20+ providers (OpenAI,
Anthropic, Google, Azure,
+ Bedrock, Ollama, etc.) via `infer_model()` and provider classes like
`AzureProvider`, `BedrockProvider`.
+ Do not re-implement provider-specific logic that pydantic-ai handles.
+ Before writing new code, check: https://ai.pydantic.dev/models/
+- **Keep the hook thin.** `PydanticAIHook.get_conn()` maps Airflow connection
fields to pydantic-ai
+ constructors. That is the hook's entire job. Do not add abstraction layers
(builders, factories,
+ registries, Protocols) on top of pydantic-ai's own abstractions.
+- **No premature abstraction.** Do not add Protocols, builder patterns, or
plugin systems for a single
+ code path. Wait until there are 3+ concrete use cases before introducing an
abstraction.
+- **Operators stay focused.** Each operator does one thing: `LLMOperator`
(prompt → output),
+ `LLMBranchOperator` (prompt → branch decision), `LLMSQLOperator` (prompt →
validated SQL).
+
+## Adding Support for a New LLM Provider
+
+If pydantic-ai already supports the provider (check [models
docs](https://ai.pydantic.dev/models/)):
+
+1. **Do nothing in this package.** Users set the `provider:model` string in
their connection
+ (e.g. `azure:gpt-4o`, `bedrock:anthropic.claude-sonnet-4-20250514`) and the
hook resolves it
+ via `infer_model()`.
+2. If the provider needs credentials beyond `api_key` and `base_url`, add a
branch in
+ `get_conn()` using pydantic-ai's own provider class (e.g. `AzureProvider`).
+3. Update the connection form docs if new fields are needed.
+
+If pydantic-ai does *not* support the provider, contribute upstream to
pydantic-ai rather than
+building a wrapper here.
+
+## Security
+
+- **No dynamic imports from connection extras.** Never use
`importlib.import_module()` on
+ user-provided strings from connection fields. Connection extras are editable
by users with
+ connection-edit permissions and must not become a code execution vector.
+- **SQL validation is on by default.** `LLMSQLOperator` validates generated
SQL via AST parsing.
+ Do not disable this default.
+
+## Pitfalls
+
+- Do not construct raw provider SDK clients (e.g. `openai.AsyncAzureOpenAI`) —
use pydantic-ai's
+ provider classes which handle client construction internally.
+- Do not add provider-specific connection types. The single `pydantic_ai`
connection type works for
+ all providers via the `provider:model` format.
+- Use `from airflow.providers.common.compat.sdk import ...` for SDK imports,
never
+ `from airflow.sdk import ...` directly.
+
+## Key Paths
+
+- Hook: `src/airflow/providers/common/ai/hooks/pydantic_ai.py`
+- Operators: `src/airflow/providers/common/ai/operators/`
+- Decorators: `src/airflow/providers/common/ai/decorators/`
+- Tests: `tests/unit/common/ai/`
+- Docs: `docs/`