(airflow) branch main updated: Strengthen Copilot review instructions to use imperative "Flag any" phrasing (#62584)

kaxilnaik Fri, 27 Feb 2026 12:39:02 -0800

This is an automated email from the ASF dual-hosted git repository.

kaxilnaik pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git



The following commit(s) were added to refs/heads/main by this push:
     new a06453823a1 Strengthen Copilot review instructions to use imperative 
"Flag any" phrasing (#62584)
a06453823a1 is described below

commit a06453823a13960387d73210253fc729504396ca
Author: Kaxil Naik <[email protected]>
AuthorDate: Fri Feb 27 20:38:48 2026 +0000

    Strengthen Copilot review instructions to use imperative "Flag any" 
phrasing (#62584)
    
    Copilot's review agent converts rules phrased as "Flag any X" into
    active code searches but treats passive phrasing as informational
    context. Rewrite all critical rules (imports, assert, time.time,
    lru_cache, session.commit, N+1 queries, unittest, unspec'd mocks)
    to use imperative directives so the agent reliably flags violations.
---
 .github/instructions/code-review.instructions.md | 52 ++++++++++++------------
 1 file changed, 27 insertions(+), 25 deletions(-)

diff --git a/.github/instructions/code-review.instructions.md 
b/.github/instructions/code-review.instructions.md
index 9e9a019f1a6..0d4ce8a8791 100644
--- a/.github/instructions/code-review.instructions.md
+++ b/.github/instructions/code-review.instructions.md
@@ -10,45 +10,49 @@ Use these rules when reviewing pull requests to the Apache 
Airflow repository.
 ## Architecture Boundaries
 
 - **Scheduler must never run user code.** It only processes serialized Dags. 
Flag any scheduler-path code that deserializes or executes Dag/task code.
-- **Workers must not access the metadata DB directly.** Task execution 
communicates with the API server through the Execution API (`/execution` 
endpoints) only.
-- **Dag Processor and Triggerer run user code in isolated processes.** Code in 
these components should maintain that isolation.
-- **Providers must not import core internals** like `SUPERVISOR_COMMS` or 
task-runner plumbing. Providers interact through the public SDK and execution 
API only.
+- **Flag any task execution code that accesses the metadata DB directly** 
instead of through the Execution API (`/execution` endpoints).
+- **Flag any code in Dag Processor or Triggerer that breaks process 
isolation** — these components run user code in isolated processes.
+- **Flag any provider importing core internals** like `SUPERVISOR_COMMS` or 
task-runner plumbing. Providers interact through the public SDK and execution 
API only.
 
 ## Database and Query Correctness
 
-- **N+1 queries**: Flag SQLAlchemy queries that access relationships inside 
loops without `joinedload()` or `selectinload()`.
-- **`run_id` is only unique per Dag.** Queries that group, partition, or join 
on `run_id` alone (without `dag_id`) will collide across Dags. Always require 
`(dag_id, run_id)` together.
-- **Cross-database compatibility**: SQL changes must work on PostgreSQL, 
MySQL, and SQLite. Flag database-specific features (lateral joins, window 
functions) without cross-DB handling.
-- **Session discipline**: In `airflow-core`, functions receiving a `session` 
parameter must not call `session.commit()`. Use keyword-only `session` 
parameters.
+- **Flag any SQLAlchemy relationship access inside a loop** without 
`joinedload()` or `selectinload()` — this is an N+1 query.
+- **Flag any query on `run_id` without `dag_id`.** `run_id` is only unique per 
Dag. Queries that filter, group, partition, or join on `run_id` alone will 
silently collide across Dags.
+- **Flag any `session.commit()` call in `airflow-core`** code that receives a 
`session` parameter. Session lifecycle is managed by the caller, not the callee.
+- **Flag any `session` parameter that is not keyword-only** (`*, session`) in 
`airflow-core`.
+- **Flag any database-specific SQL** (e.g., `LATERAL` joins, PostgreSQL-only 
functions, MySQL-only syntax) without cross-DB handling. SQL must work on 
PostgreSQL, MySQL, and SQLite.
 
 ## Code Quality Rules
 
-- No `assert` in production code (stripped in optimized Python).
-- `time.monotonic()` for durations, not `time.time()`.
-- Imports at top of file. Valid exceptions: circular imports, lazy loading for 
worker isolation, `TYPE_CHECKING` blocks.
-- Guard heavy type-only imports (e.g., `kubernetes.client`) with 
`TYPE_CHECKING` in multi-process code paths.
-- Unbounded caches are bugs: all `@lru_cache` must have `maxsize`.
-- Resources (files, connections, sessions) must use context managers or 
`try/finally`.
+- **Flag any `assert` in non-test code.** `assert` is stripped in optimized 
Python (`python -O`), making it a silent no-op in production.
+- **Flag any `time.time()` used for measuring durations.** Use 
`time.monotonic()` instead — `time.time()` is affected by system clock 
adjustments.
+- **Flag any `from` or `import` statement inside a function or method body.** 
Imports must be at the top of the file. The only valid exceptions are: (1) 
circular import avoidance, (2) lazy loading for worker isolation, (3) 
`TYPE_CHECKING` blocks. If the import is inside a function, ask the author to 
justify why it cannot be at module level.
+- **Flag any `@lru_cache(maxsize=None)`.** This creates an unbounded cache — 
every unique argument set is cached forever. Note: `@lru_cache()` without 
arguments defaults to `maxsize=128` and is fine.
+- **Flag any heavy import** (e.g., `kubernetes.client`) in multi-process code 
paths that is not behind a `TYPE_CHECKING` guard.
+- **Flag any file, connection, or session opened without a context manager or 
`try/finally`.**
 
 ## Testing Requirements
 
-- New behavior requires tests covering success, failure, and edge cases.
-- Use pytest patterns, not `unittest.TestCase`.
-- Use `spec`/`autospec` when mocking.
-- Use `time_machine` for time-dependent tests.
-- Imports belong at the top of test files, not inside test functions.
-- Issue numbers do not belong in test docstrings.
+- **Flag any new public method or behavior without corresponding tests.** 
Tests must cover success, failure, and edge cases.
+- **Flag any `unittest.TestCase` subclass.** Use pytest patterns instead.
+- **Flag any `mock.Mock()` or `mock.MagicMock()` without `spec` or 
`autospec`.** Unspec'd mocks silently accept any attribute access, hiding real 
bugs.
+- **Flag any `time.sleep` or `datetime.now()` in tests.** Use `time_machine` 
for time-dependent tests.
+- **Flag any issue number in test docstrings** (e.g., `"""Fix for #12345"""`) 
— test names should describe behavior, not track tickets.
 
 ## API Correctness
 
-- `map_index` must be handled correctly for mapped tasks. Queries without 
`map_index` filtering may return arbitrary task instances.
-- Execution API changes must follow Cadwyn versioning (CalVer format).
+- **Flag any query on mapped task instances that does not filter on 
`map_index`.** Without it, queries may return arbitrary instances from the 
mapped set.
+- **Flag any Execution API change without a Cadwyn version migration** (CalVer 
format).
 
 ## UI Code (React/TypeScript)
 
 - Avoid `useState + useEffect` to sync derived state. Use nullish coalescing 
or nullable override patterns instead.
 - Extract shared logic into custom hooks rather than copy-pasting across 
components.
 
+## Generated Files
+
+- **Flag any manual edits to files in `openapi-gen/`** or Task SDK generated 
models. These must be regenerated, not hand-edited.
+
 ## AI-Generated Code Signals
 
 Flag these patterns that indicate low-quality AI-generated contributions:
@@ -63,7 +67,5 @@ Flag these patterns that indicate low-quality AI-generated 
contributions:
 
 ## Quality Signals to Check
 
-The absence of these signals in a "fix" or "optimization" PR is itself a red 
flag:
-
-- **Bug fixes need regression tests**: A test that fails without the fix and 
passes with it.
-- **Existing tests must still pass without modification**: If existing tests 
need changes to pass, the PR may introduce a behavioral regression.
+- **For bug-fix PRs, flag if there is no regression test** — a test that fails 
without the fix and passes with it.
+- **Flag any existing test modified to accommodate new behavior** — this may 
indicate a behavioral regression rather than a genuine fix.

(airflow) branch main updated: Strengthen Copilot review instructions to use imperative "Flag any" phrasing (#62584)

Reply via email to