This is an automated email from the ASF dual-hosted git repository.
kaxilnaik pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git
The following commit(s) were added to refs/heads/main by this push:
new 75444c274c5 Improve AGENTS.md with actionable development guidance
(#62440)
75444c274c5 is described below
commit 75444c274c5b014126f5ca0079320b9f07e27341
Author: Kaxil Naik <[email protected]>
AuthorDate: Wed Feb 25 02:30:37 2026 +0000
Improve AGENTS.md with actionable development guidance (#62440)
I have been using AGENTS.md with various AI coding tools (Cursor,
Claude Code, Copilot) since late last year and have iterated on it
many times. Sharing this so everyone benefits from the same workflow
and so new contributors using AI tools get guardrails that prevent
common mistakes — the AI will enforce what humans sometimes skip.
Changes:
- Replace the sparse doc-index style with concrete environment setup,
commands, repo structure, architecture boundaries, coding/testing
standards, and commit conventions
- Add nested AGENTS.md files for Execution API (Cadwyn versioning)
and providers (layout, provider.yaml, dependency rules)
---
AGENTS.md | 101 ++++++++++++++++++---
.../airflow/api_fastapi/execution_api/AGENTS.md | 42 +++++++++
providers/AGENTS.md | 19 ++++
3 files changed, 147 insertions(+), 15 deletions(-)
diff --git a/AGENTS.md b/AGENTS.md
index 8a6c3050062..456a87ca4b2 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -3,29 +3,100 @@
# AGENTS instructions
-The main developer documentation lives in the `contributing-docs` directory.
The following points summarise
-how to set up the environment, run checks, build docs and follow the PR
workflow.
+## Environment Setup
-## Local virtualenv and Breeze
+- Install prek: `uv tool install prek`
+- Enable commit hooks: `prek install`
+- **Never run pytest, python, or airflow commands directly on the host** —
always use `breeze`.
+- Place temporary scripts in `dev/` (mounted as `/opt/airflow/dev/` inside
Breeze).
-- [`07_local_virtualenv.rst`](contributing-docs/07_local_virtualenv.rst)
explains how to prepare a local Python environment using `uv`. The tool creates
and syncs a `.venv` and installs dependencies with commands such as `uv venv`
and `uv sync`.
--
[`06_development_environments.rst`](contributing-docs/06_development_environments.rst)
compares the local virtualenv with the Docker based Breeze environment. Breeze
replicates CI and includes services like databases for integration tests.
+## Commands
-## Prek hooks
+- **Run a single test:** `breeze run pytest
path/to/test.py::TestClass::test_method -xvs`
+- **Run a test file:** `breeze run pytest path/to/test.py -xvs`
+- **Run a Python script:** `breeze run python dev/my_script.py`
+- **Run Airflow CLI:** `breeze run airflow dags list`
+- **Type-check:** `breeze run mypy path/to/code`
+- **Lint/format (runs on host):** `prek run --all-files`
+- **Lint with ruff only:** `prek run ruff --all-files`
+- **Format with ruff only:** `prek run ruff-format --all-files`
+- **Build docs:** `breeze build-docs`
-- Installation and usage of `prek` are described in
[`03a_contributors_quick_start_beginners.rst`](contributing-docs/03a_contributors_quick_start_beginners.rst).
Install with `uv tool install prek` and run checks via `prek --all-files`.
-- [`08_static_code_checks.rst`](contributing-docs/08_static_code_checks.rst)
provides more details on the available hooks and prerequisites. Enable the
hooks with `prek install` so they run automatically on each commit.
+SQLite is the default backend. Use `--backend postgres` or `--backend mysql`
for integration tests that need those databases. If Docker networking fails,
run `docker network prune`.
-## Running tests
+## Repository Structure
--
[`03a_contributors_quick_start_beginners.rst`](contributing-docs/03a_contributors_quick_start_beginners.rst)
shows running tests inside Breeze. Use `pytest` inside the container for
individual files or invoke `breeze testing` commands to run full suites, e.g.
`breeze --backend postgres --python 3.10 testing tests --test-type All`.
+UV workspace monorepo. Key paths:
-## Building documentation
+- `airflow-core/src/airflow/` — core scheduler, API, CLI, models
+ - `models/` — SQLAlchemy models (DagModel, TaskInstance, DagRun, Asset, etc.)
+ - `jobs/` — scheduler, triggerer, Dag processor runners
+ - `api_fastapi/core_api/` — public REST API v2, UI endpoints
+ - `api_fastapi/execution_api/` — task execution communication API
+ - `dag_processing/` — Dag parsing and validation
+ - `cli/` — command-line interface
+ - `ui/` — React/TypeScript web interface (Vite)
+- `task-sdk/` — lightweight SDK for Dag authoring and task execution runtime
+ - `src/airflow/sdk/execution_time/` — task runner, supervisor
+- `providers/` — 100+ provider packages, each with its own `pyproject.toml`
+- `airflow-ctl/` — management CLI tool
+- `chart/` — Helm chart for Kubernetes deployment
-- Documentation can be built locally using `uv run --group docs build-docs` as
described in
[`11_documentation_building.rst`](contributing-docs/11_documentation_building.rst).
Within Breeze the equivalent command is `breeze build-docs`.
+## Architecture Boundaries
-## Pull request guidelines
+1. Users author Dags with the Task SDK (`airflow.sdk`).
+2. Dag Processor parses Dag files in isolated processes and stores serialized
Dags in the metadata DB.
+3. Scheduler reads serialized Dags — **never runs user code** — and creates
Dag runs / task instances.
+4. Workers execute tasks via Task SDK and communicate with the API server
through the Execution API — **never access the metadata DB directly**.
+5. API Server serves the React UI and handles all client-database interactions.
+6. Triggerer evaluates deferred tasks/sensors in isolated processes.
-- Follow the PR guidance in
[`05_pull_requests.rst`](contributing-docs/05_pull_requests.rst). Always add
tests, keep your branch rebased instead of merged, and adhere to the commit
message recommendations from [cbea.ms/git-commit](https://cbea.ms/git-commit/).
+## Coding Standards
-For advanced topics such as packaging providers and API versioning see
[`12_provider_distributions.rst`](contributing-docs/12_provider_distributions.rst)
and
[`19_execution_api_versioning.rst`](contributing-docs/19_execution_api_versioning.rst).
+- No `assert` in production code.
+- `time.monotonic()` for durations, not `time.time()`.
+- In `airflow-core`, functions with a `session` parameter must not call
`session.commit()`. Use keyword-only `session` parameters.
+- Imports at top of file. Valid exceptions: circular imports, lazy loading for
worker isolation, `TYPE_CHECKING` blocks.
+- Guard heavy type-only imports (e.g., `kubernetes.client`) with
`TYPE_CHECKING` in multi-process code paths.
+- Apache License header on all new files (prek enforces this).
+
+## Testing Standards
+
+- Add tests for new behavior — cover success, failure, and edge cases.
+- Use pytest patterns, not `unittest.TestCase`.
+- Use `spec`/`autospec` when mocking.
+- Use `time_machine` for time-dependent tests.
+- Use `@pytest.mark.parametrize` for multiple similar inputs.
+- Test fixtures: `devel-common/src/tests_common/pytest_plugin.py`.
+- Test location mirrors source: `airflow/cli/cli_parser.py` →
`tests/cli/test_cli_parser.py`.
+
+## Commits and PRs
+
+Write commit messages focused on user impact, not implementation details.
+
+- **Good:** `Fix airflow dags test command failure without serialized Dags`
+- **Good:** `UI: Fix Grid view not refreshing after task actions`
+- **Bad:** `Initialize DAG bundles in CLI get_dag function`
+
+Add a newsfragment for user-visible changes:
+`echo "Brief description" >
airflow-core/newsfragments/{PR_NUMBER}.{bugfix|feature|improvement|doc|misc|significant}.rst`
+
+## Boundaries
+
+- **Ask first**
+ - Large cross-package refactors.
+ - New dependencies with broad impact.
+ - Destructive data or migration changes.
+- **Never**
+ - Commit secrets, credentials, or tokens.
+ - Edit generated files by hand when a generation workflow exists.
+ - Use destructive git operations unless explicitly requested.
+
+## References
+
+-
[`contributing-docs/03a_contributors_quick_start_beginners.rst`](contributing-docs/03a_contributors_quick_start_beginners.rst)
+-
[`contributing-docs/05_pull_requests.rst`](contributing-docs/05_pull_requests.rst)
+-
[`contributing-docs/07_local_virtualenv.rst`](contributing-docs/07_local_virtualenv.rst)
+-
[`contributing-docs/08_static_code_checks.rst`](contributing-docs/08_static_code_checks.rst)
+-
[`contributing-docs/12_provider_distributions.rst`](contributing-docs/12_provider_distributions.rst)
+-
[`contributing-docs/19_execution_api_versioning.rst`](contributing-docs/19_execution_api_versioning.rst)
diff --git a/airflow-core/src/airflow/api_fastapi/execution_api/AGENTS.md
b/airflow-core/src/airflow/api_fastapi/execution_api/AGENTS.md
new file mode 100644
index 00000000000..56e2baca439
--- /dev/null
+++ b/airflow-core/src/airflow/api_fastapi/execution_api/AGENTS.md
@@ -0,0 +1,42 @@
+ <!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# Execution API — Agent Instructions
+
+## Versioning
+
+This API uses [Cadwyn](https://github.com/zmievsa/cadwyn) with CalVer
(`vYYYY_MM_DD.py`).
+
+Workers and API servers deploy independently, so backward compatibility is
critical — older clients must work with newer servers.
+
+### When Making Changes
+
+1. Check the latest version file in `versions/`. If its date is in the future
(unreleased), add your `VersionChange` class to that file. Otherwise create a
new `vYYYY_MM_DD.py`.
+2. Update the version bundle in `versions/__init__.py` only when creating a
new file.
+3. Regenerate Task SDK models:
+
+```bash
+cd task-sdk && python dev/generate_task_sdk_models.py
+```
+
+4. Add tests for both the new and previous API versions.
+
+### Common Patterns
+
+- New schema field: `schema(Model).field("name").didnt_exist`
+- New endpoint: `endpoint("/path", ["GET"]).didnt_exist`
+- Response changes: implement
`@convert_response_to_previous_version_for(Model)` and check field existence
before popping.
+
+### Pitfalls
+
+- Don't use keyword arguments with `endpoint()` — use positional:
`endpoint("/path", ["GET"])`.
+- Don't add changes to already-released version files.
+- Don't forget response converters for new fields in nested objects.
+
+### Key Paths
+
+- Models: `datamodels/`
+- Routes: `routes/`
+- Versions: `versions/`
+- Task SDK generated models:
`task-sdk/src/airflow/sdk/api/datamodels/_generated.py`
+- Full versioning guide:
[`contributing-docs/19_execution_api_versioning.rst`](../../../../contributing-docs/19_execution_api_versioning.rst)
diff --git a/providers/AGENTS.md b/providers/AGENTS.md
new file mode 100644
index 00000000000..71278fdde78
--- /dev/null
+++ b/providers/AGENTS.md
@@ -0,0 +1,19 @@
+<!-- SPDX-License-Identifier: Apache-2.0
+ https://www.apache.org/licenses/LICENSE-2.0 -->
+
+# Providers — Agent Instructions
+
+Each provider is an independent package with its own `pyproject.toml`, tests,
and documentation.
+
+## Structure
+
+- `provider.yaml` — metadata, dependencies, and configuration for the provider.
+- Building blocks: Hooks, Operators, Sensors, Transfers.
+- Use `version_compat.py` patterns for cross-version compatibility.
+
+## Checklist
+
+- Keep `provider.yaml` metadata, docs, and tests in sync.
+- Don't upper-bound dependencies by default; add limits only with
justification.
+- Tests live alongside the provider — mirror source paths in test directories.
+- Full guide:
[`contributing-docs/12_provider_distributions.rst`](../contributing-docs/12_provider_distributions.rst)