Copilot commented on code in PR #63876:
URL: https://github.com/apache/airflow/pull/63876#discussion_r3025353460


##########
contributing-docs/11_documentation_building.rst:
##########
@@ -241,6 +241,17 @@ For example:
 
 Will build ``fab`` provider documentation and clean build artifacts before.
 
+.. agent-skill::
+   :id: build-docs
+   :category: documentation
+   :description: Build Airflow documentation. Pass package to build only that 
provider's docs (much faster). Runs via breeze on host or inside breeze.
+   :local: breeze build-docs
+   :breeze: breeze build-docs
+   :fallback: breeze build-docs --package-filter {package}

Review Comment:
   The `build-docs` skill currently resolves to `breeze build-docs` even on the 
host, but this same document describes `uv run --group docs build-docs` as the 
faster/default local workflow and positions `breeze build-docs` as a fallback 
when local building fails. Also, the optional `package` param is only 
referenced in the non-preferred `:fallback:` step, which cannot be selected by 
simply providing `package=...` (so the param is effectively unusable). Suggest 
updating the skill to use the `uv run --group docs build-docs` command for the 
host path (including optional package selection), and reserving Breeze for 
genuine fallback / in-container execution.
   ```suggestion
      :description: Build Airflow documentation. Pass package to build only 
that provider's docs (much faster). Uses ``uv run --group docs build-docs`` on 
the host and Breeze inside the container or as a fallback.
      :local: uv run --group docs build-docs {package}
      :breeze: breeze build-docs {package}
      :fallback: breeze build-docs {package}
   ```



##########
scripts/ci/prek/check_agent_skills_valid.py:
##########
@@ -0,0 +1,148 @@
+#!/usr/bin/env python
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Validate agent_skills.rst: no duplicate IDs, no missing required fields, 
valid contexts.
+
+Runs as a prek (pre-commit) hook. Exits 1 if any violation is found so that
+contributors cannot commit a malformed skill definition.
+
+Checked rules:
+  - Every ``.. agent-skill::`` directive must have an ``id`` field.
+  - Every skill must have at least one of ``local`` or ``breeze`` (a step to 
execute).
+  - Skill IDs must be unique across all source files.
+  - ``params`` values, if present, must end with ``:required`` or 
``:optional``.
+"""
+
+from __future__ import annotations
+
+import re
+import sys
+from pathlib import Path
+
+REPO_ROOT = Path(__file__).resolve().parents[3]
+
+# Must mirror AGENT_SKILLS_RST_FILES in context_detect.py exactly.
+AGENT_SKILLS_RST_FILES: list[Path] = [
+    REPO_ROOT / "contributing-docs" / 
"03a_contributors_quick_start_beginners.rst",
+    REPO_ROOT / "contributing-docs" / "08_static_code_checks.rst",
+    REPO_ROOT / "contributing-docs" / "11_documentation_building.rst",
+    REPO_ROOT / "contributing-docs" / "testing" / "unit_tests.rst",
+]
+

Review Comment:
   `AGENT_SKILLS_RST_FILES` is duplicated here and must be kept in sync with 
`context_detect.py` (the file even notes this). This creates an easy drift 
point when adding/removing contributing-docs sources. Consider importing the 
list from `ci.prek.context_detect` (or moving it to a shared module/constants 
file) so there’s a single source of truth.
   ```suggestion
   from ci.prek.context_detect import AGENT_SKILLS_RST_FILES
   
   REPO_ROOT = Path(__file__).resolve().parents[3]
   ```



##########
scripts/ci/prek/context_detect.py:
##########
@@ -0,0 +1,510 @@
+#!/usr/bin/env python
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Runtime host vs Breeze container detection for AI agent skills.
+
+``.. agent-skill::`` directives are embedded directly in contributing-docs RST
+files — the source of truth. A generator extracts them into ``skills.json`` so
+agents can read a static artifact without parsing RST at every invocation.
+
+Source files scanned (see AGENT_SKILLS_RST_FILES):
+  contributing-docs/03a_contributors_quick_start_beginners.rst
+  contributing-docs/08_static_code_checks.rst
+  contributing-docs/11_documentation_building.rst
+  contributing-docs/testing/unit_tests.rst
+
+Generated artifact (see AGENT_SKILLS_JSON):
+  .github/skills/airflow-contributor/assets/skills.json
+
+The JSON is regenerated with ``--generate`` and validated against the RST with
+``--check`` (run as a prek hook so drift is caught at commit time).
+
+Detection priority chain:
+  1. AIRFLOW_BREEZE_CONTAINER env var set  -> "breeze"
+  2. /.dockerenv file exists               -> "breeze"
+  3. /opt/airflow path exists              -> "breeze"
+  4. (default)                             -> "host"
+
+Usage (importable):
+    from ci.prek.context_detect import get_context, get_command
+
+    ctx = get_context()                    # "host" or "breeze"
+    cmd = get_command("run-single-test",   # raises if skill not found
+                      project="providers/amazon",
+                      test_path="providers/amazon/tests/test_s3.py")
+
+Usage (CLI):
+    python scripts/ci/prek/context_detect.py --list
+    python scripts/ci/prek/context_detect.py --generate        # write 
skills.json
+    python scripts/ci/prek/context_detect.py --check           # drift check 
(CI)
+    python scripts/ci/prek/context_detect.py run-single-test 
project=providers/vertica test_path=...
+    python scripts/ci/prek/context_detect.py run-single-test --context breeze 
test_path=...
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import re
+import sys
+from pathlib import Path
+
+REPO_ROOT = Path(__file__).resolve().parents[3]
+
+# Contributing-docs files that contain embedded ``.. agent-skill::`` 
directives.
+# Skills live next to the human-readable workflow they describe so updates are 
atomic.
+AGENT_SKILLS_RST_FILES: list[Path] = [
+    REPO_ROOT / "contributing-docs" / 
"03a_contributors_quick_start_beginners.rst",
+    REPO_ROOT / "contributing-docs" / "08_static_code_checks.rst",
+    REPO_ROOT / "contributing-docs" / "11_documentation_building.rst",
+    REPO_ROOT / "contributing-docs" / "testing" / "unit_tests.rst",
+]
+
+_DIRECTIVE_RE = re.compile(r"^\.\.\s+agent-skill::\s*$")
+_OPTION_RE = re.compile(r"^\s+:([^:]+):\s+(.+)$")
+
+# Committed JSON artifact — agent-facing interface generated from the RST 
sources.
+AGENT_SKILLS_JSON: Path = REPO_ROOT / ".github" / "skills" / 
"airflow-contributor" / "assets" / "skills.json"
+
+
+# ---------------------------------------------------------------------------
+# Context detection
+# ---------------------------------------------------------------------------
+
+
+def get_context() -> str:
+    """Return 'breeze' if running inside the Breeze container, else 'host'.
+
+    Detection priority:
+      1. AIRFLOW_BREEZE_CONTAINER env var
+      2. /.dockerenv file (set by Docker runtime)
+      3. /opt/airflow directory (Breeze mount point)
+      4. default: host
+    """
+    if os.environ.get("AIRFLOW_BREEZE_CONTAINER"):
+        return "breeze"
+    if Path("/.dockerenv").exists():
+        return "breeze"
+    if Path("/opt/airflow").exists():
+        return "breeze"
+    return "host"
+
+
+# ---------------------------------------------------------------------------
+# RST parsing
+# ---------------------------------------------------------------------------
+
+
+def _collect_options(lines: list[str], start: int) -> tuple[dict[str, str], 
int]:
+    opts: dict[str, str] = {}
+    i = start
+    while i < len(lines):
+        m = _OPTION_RE.match(lines[i])
+        if m:
+            opts[m.group(1)] = m.group(2).strip()
+            i += 1
+        elif lines[i].strip() == "" or not lines[i].startswith(" "):
+            break
+        else:
+            i += 1
+    return opts, i
+
+
+def _parse_skills(rst_path: Path) -> list[dict]:
+    """Parse all ``.. agent-skill::`` directives from rst_path.
+
+    Returns a list of skill dicts with fields:
+      id, category, description, steps, prereqs, params, expected_output
+
+    Each step dict has: context ("host"|"breeze"), command, preferred (bool).
+    A skill with both :local: and :fallback: produces two host steps —
+    preferred=True for local, preferred=False for fallback.
+
+    Raises:
+        FileNotFoundError: if rst_path does not exist.
+    """
+    if not rst_path.exists():
+        raise FileNotFoundError(f"agent_skills.rst not found at {rst_path}")
+
+    lines = rst_path.read_text(encoding="utf-8").splitlines()
+    skills: list[dict] = []
+    i = 0
+    while i < len(lines):
+        if _DIRECTIVE_RE.match(lines[i]):
+            opts, i = _collect_options(lines, i + 1)
+            skill_id = opts.get("id", "").strip()
+            if not skill_id:
+                continue
+
+            steps: list[dict] = []
+            local_cmd = opts.get("local", "").strip()
+            fallback_cmd = opts.get("fallback", "").strip()
+            breeze_cmd = opts.get("breeze", "").strip()
+
+            if local_cmd:
+                steps.append({"context": "host", "command": local_cmd, 
"preferred": True})
+            if fallback_cmd:
+                steps.append({"context": "host", "command": fallback_cmd, 
"preferred": False})
+            if breeze_cmd:
+                steps.append({"context": "breeze", "command": breeze_cmd, 
"preferred": True})
+
+            prereqs = [p.strip() for p in opts.get("prereqs", "").split(",") 
if p.strip()]
+            params: dict[str, bool] = {}
+            for raw_param in opts.get("params", "").split(","):
+                raw = raw_param.strip()
+                if ":" in raw:
+                    name, req = raw.split(":", 1)
+                    params[name.strip()] = req.strip() == "required"
+
+            skills.append(
+                {
+                    "id": skill_id,
+                    "category": opts.get("category", ""),
+                    "description": opts.get("description", ""),
+                    "steps": steps,
+                    "prereqs": prereqs,
+                    "params": params,
+                    "expected_output": opts.get("expected-output", ""),
+                }
+            )
+        else:
+            i += 1
+
+    return skills
+
+
+def _parse_skills_from_files(rst_paths: list[Path] = AGENT_SKILLS_RST_FILES) 
-> list[dict]:
+    """Parse ``.. agent-skill::`` directives from all contributing-docs source 
files.
+
+    Merges results across files. Raises ValueError if the same skill id appears
+    in more than one file (duplicate IDs are a configuration error).
+
+    Raises:
+        FileNotFoundError: if any path in rst_paths does not exist.
+        ValueError: if a duplicate skill id is found across files.
+    """
+    all_skills: list[dict] = []
+    seen: dict[str, Path] = {}
+    for path in rst_paths:
+        for skill in _parse_skills(path):
+            sid = skill["id"]
+            if sid in seen:
+                raise ValueError(
+                    f"Duplicate skill id '{sid}' found in {path} (already 
defined in {seen[sid]})"
+                )
+            seen[sid] = path
+            all_skills.append(skill)
+    return all_skills
+
+
+def _find_skill(skill_id: str, skills: list[dict]) -> dict:
+    """Return skill dict by id. Raises KeyError if not found."""
+    for skill in skills:
+        if skill.get("id") == skill_id:
+            return skill
+    known = [s.get("id") for s in skills]
+    raise KeyError(f"Skill '{skill_id}' not found. Available: {known}")
+
+
+# ---------------------------------------------------------------------------
+# JSON artifact: generate and drift-check
+# ---------------------------------------------------------------------------
+
+
+def _load_skills(
+    json_path: Path = AGENT_SKILLS_JSON,
+    rst_paths: list[Path] = AGENT_SKILLS_RST_FILES,
+) -> list[dict]:
+    """Load skills from committed JSON if available, else parse RST.
+
+    JSON is the fast path for agents at runtime. RST is the source of truth
+    and is used as fallback when JSON hasn't been generated yet (e.g. during
+    development before running ``--generate``).
+
+    The JSON fast path is only used when the caller is using the default RST
+    paths. If a caller overrides ``rst_paths`` (e.g. in tests), RST is always
+    parsed so the custom paths are respected.
+    """
+    if rst_paths is AGENT_SKILLS_RST_FILES and json_path.exists():
+        return json.loads(json_path.read_text(encoding="utf-8"))
+    return _parse_skills_from_files(rst_paths)
+
+
+def generate_skills_json(
+    output_path: Path = AGENT_SKILLS_JSON,
+    rst_paths: list[Path] = AGENT_SKILLS_RST_FILES,
+) -> None:
+    """Parse RST sources and write the canonical skills.json artifact.
+
+    Run this after editing any ``.. agent-skill::`` directive and commit the
+    updated skills.json alongside the RST change so the two stay in sync.
+
+    Raises:
+        FileNotFoundError: if any RST source file does not exist.
+        ValueError: if a duplicate skill id is found.
+    """
+    skills = _parse_skills_from_files(rst_paths)
+    output_path.write_text(json.dumps(skills, indent=2) + "\n", 
encoding="utf-8")
+    print(f"Generated {output_path.relative_to(REPO_ROOT)} ({len(skills)} 
skill(s))")
+
+
+def check_skills_json_drift(
+    json_path: Path = AGENT_SKILLS_JSON,
+    rst_paths: list[Path] = AGENT_SKILLS_RST_FILES,
+) -> list[str]:
+    """Return error strings if skills.json doesn't match what RST sources 
produce.
+
+    Used by the prek ``check-agent-skills-drift`` hook. An empty list means the
+    JSON is up to date and the commit can proceed.
+    """
+    errors: list[str] = []
+
+    if not json_path.exists():
+        errors.append(
+            f"skills.json not found at {json_path.relative_to(REPO_ROOT)}. "
+            f"Run: python scripts/ci/prek/context_detect.py --generate"
+        )
+        return errors
+
+    try:
+        current = _parse_skills_from_files(rst_paths)
+    except (FileNotFoundError, ValueError) as exc:
+        errors.append(str(exc))
+        return errors
+
+    stored = json.loads(json_path.read_text(encoding="utf-8"))
+
+    # Normalize both sides to a canonical JSON string for comparison.
+    current_canonical = json.dumps(current, indent=2, sort_keys=True)
+    stored_canonical = json.dumps(stored, indent=2, sort_keys=True)
+
+    if current_canonical != stored_canonical:
+        errors.append(
+            "skills.json is out of sync with RST sources.\n"
+            "  Run: python scripts/ci/prek/context_detect.py --generate\n"
+            f"  Then commit the updated {json_path.relative_to(REPO_ROOT)}"
+        )
+    return errors
+
+
+# ---------------------------------------------------------------------------
+# Command routing
+# ---------------------------------------------------------------------------
+
+
+def get_command(
+    skill_id: str,
+    rst_paths: list[Path] = AGENT_SKILLS_RST_FILES,
+    **kwargs: str,
+) -> str:
+    """Return the correct command for skill_id in the current context.
+
+    Applies {placeholder} substitution from kwargs.
+    If any kwarg value is 'false' and the skill has a non-preferred (fallback)
+    step for the current context, the fallback step is used instead.
+
+    Raises:
+        FileNotFoundError: if any source RST file does not exist.
+        KeyError: if skill_id is not found.
+        ValueError: if a required parameter placeholder is missing from kwargs.
+    """
+    skills = _load_skills(rst_paths=rst_paths)
+    skill = _find_skill(skill_id, skills)
+    ctx = get_context()
+
+    context_steps = [s for s in skill["steps"] if s["context"] in (ctx, 
"either")]
+
+    if not context_steps:
+        available_contexts = {s["context"] for s in skill["steps"]}
+        return (
+            f"Skill '{skill_id}' has no steps for context '{ctx}'. "
+            f"This skill requires: {sorted(available_contexts)}"
+        )
+
+    preferred_steps = [s for s in context_steps if s.get("preferred", True)]
+    fallback_steps = [s for s in context_steps if not s.get("preferred", True)]
+
+    use_fallback = any(v == "false" for v in kwargs.values()) and 
bool(fallback_steps)
+    if use_fallback:
+        selected_step = fallback_steps[0]
+    elif preferred_steps:
+        selected_step = preferred_steps[0]
+    else:
+        selected_step = context_steps[0]
+
+    cmd = selected_step["command"]
+    # Exclude "false" sentinel values from substitution
+    sub_kwargs = {k: v for k, v in kwargs.items() if v != "false"}
+    try:
+        return cmd.format(**sub_kwargs)
+    except KeyError as exc:
+        missing = exc.args[0]
+        raise ValueError(
+            f"Missing parameter '{missing}' for skill '{skill_id}'. Command 
template: {cmd}"
+        ) from exc
+
+
+# ---------------------------------------------------------------------------
+# Skill discovery
+# ---------------------------------------------------------------------------
+
+
+def list_skills_for_context(
+    category: str | None = None,
+    rst_paths: list[Path] = AGENT_SKILLS_RST_FILES,
+) -> list[dict]:
+    """Return all skills that have at least one step valid for the current 
context.
+
+    Optionally filter by category.
+
+    Raises:
+        FileNotFoundError: if any source RST file does not exist.
+    """
+    ctx = get_context()
+    skills = _load_skills(rst_paths=rst_paths)
+
+    def _has_step_for_context(skill: dict) -> bool:
+        return any(s["context"] in (ctx, "either") for s in skill.get("steps", 
[]))
+
+    result = [s for s in skills if _has_step_for_context(s)]
+    if category:
+        result = [s for s in result if s.get("category") == category]
+    return result
+
+
+# ---------------------------------------------------------------------------
+# CLI entry point
+# ---------------------------------------------------------------------------
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser(
+        description="Resolve the exact command for a skill in the current 
execution context.",
+        epilog="Parameters are passed as key=value pairs after the skill ID.",
+    )
+    parser.add_argument("skill_id", nargs="?", help="Skill ID (defined in 
contributing-docs source files)")
+    parser.add_argument(
+        "params",
+        nargs="*",
+        metavar="key=value",
+        help="Parameter substitutions (e.g. project=providers/vertica)",
+    )
+    parser.add_argument(
+        "--list",
+        action="store_true",
+        help="List all available skill IDs and exit",
+    )
+    parser.add_argument(
+        "--generate",
+        action="store_true",
+        help="Generate skills.json from RST sources and exit",
+    )
+    parser.add_argument(
+        "--check",
+        action="store_true",
+        help="Check skills.json is in sync with RST and exit (used by prek 
hook)",
+    )
+    parser.add_argument(
+        "--context",
+        choices=["host", "breeze"],
+        help="Override context detection (default: auto-detect)",
+    )
+    args = parser.parse_args()
+
+    if args.generate:
+        try:
+            generate_skills_json()
+            return 0
+        except (FileNotFoundError, ValueError) as exc:
+            print(str(exc), file=sys.stderr)
+            return 1
+
+    if args.check:
+        drift_errors = check_skills_json_drift()
+        if drift_errors:
+            for err in drift_errors:
+                print(f"ERROR: {err}", file=sys.stderr)
+            return 1
+        total = len(_load_skills())
+        print(f"OK: skills.json is up to date ({total} skill(s))")
+        return 0
+
+    try:
+        skills = _load_skills()
+    except (FileNotFoundError, ValueError) as exc:
+        print(str(exc), file=sys.stderr)
+        return 1
+
+    if args.list:
+        for entry in skills:
+            print(f"{entry['id']:30s}  {entry['description']}")
+        return 0
+
+    if not args.skill_id:
+        parser.error("skill_id is required unless --list is used")
+
+    skill: dict | None = next((s for s in skills if s["id"] == args.skill_id), 
None)
+    if skill is None:
+        known = [s["id"] for s in skills]
+        print(f"ERROR: unknown skill '{args.skill_id}'. Known: {', 
'.join(known)}", file=sys.stderr)
+        return 1
+
+    params: dict[str, str] = {}
+    for p in args.params:
+        if "=" not in p:
+            print(f"ERROR: parameter must be key=value, got: '{p}'", 
file=sys.stderr)
+            return 1
+        k, v = p.split("=", 1)
+        params[k.strip()] = v.strip()
+
+    context = args.context or get_context()
+    ctx_steps = [s for s in skill["steps"] if s["context"] in (context, 
"either")]
+    if not ctx_steps:
+        available = sorted({s["context"] for s in skill["steps"]})
+        print(
+            f"ERROR: skill '{args.skill_id}' has no steps for context 
'{context}'. Requires: {available}",
+            file=sys.stderr,
+        )
+        return 1
+
+    preferred = next((s for s in ctx_steps if s.get("preferred", True)), 
ctx_steps[0])
+    fallback_steps = [s for s in ctx_steps if not s.get("preferred", True)]
+
+    def substitute(cmd: str) -> str:
+        for k, v in params.items():
+            cmd = cmd.replace(f"{{{k}}}", v)
+        return cmd
+
+    command = substitute(preferred["command"])
+    print(command)
+
+    if fallback_steps:
+        print(
+            f"# fallback (if system deps missing): 
{substitute(fallback_steps[0]['command'])}",
+            file=sys.stderr,
+        )

Review Comment:
   CLI resolution logic diverges from `get_command()`: it always prints the 
preferred step and only emits the fallback as a comment, while `get_command()` 
can actually switch to the fallback based on the "false" sentinel. The CLI also 
uses manual `.replace()` substitution rather than the same `.format()` behavior 
(so missing placeholders won't error consistently). To avoid confusing 
users/agents and to keep drift checks meaningful, consider routing the CLI 
execution through `get_command()` (with `--context` support) and reusing the 
same substitution + fallback semantics.
   ```suggestion
       try:
           # Delegate resolution and substitution (including fallback semantics)
           # to the shared get_command() implementation so CLI and API stay in 
sync.
           command = get_command(args.skill_id, context=context, **params)
       except Exception as exc:  # noqa: BLE001
           print(f"ERROR: {exc}", file=sys.stderr)
           return 1
   
       print(command)
   ```



##########
scripts/ci/prek/context_detect.py:
##########
@@ -0,0 +1,510 @@
+#!/usr/bin/env python
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Runtime host vs Breeze container detection for AI agent skills.
+
+``.. agent-skill::`` directives are embedded directly in contributing-docs RST
+files — the source of truth. A generator extracts them into ``skills.json`` so
+agents can read a static artifact without parsing RST at every invocation.
+
+Source files scanned (see AGENT_SKILLS_RST_FILES):
+  contributing-docs/03a_contributors_quick_start_beginners.rst
+  contributing-docs/08_static_code_checks.rst
+  contributing-docs/11_documentation_building.rst
+  contributing-docs/testing/unit_tests.rst
+
+Generated artifact (see AGENT_SKILLS_JSON):
+  .github/skills/airflow-contributor/assets/skills.json
+
+The JSON is regenerated with ``--generate`` and validated against the RST with
+``--check`` (run as a prek hook so drift is caught at commit time).
+
+Detection priority chain:
+  1. AIRFLOW_BREEZE_CONTAINER env var set  -> "breeze"
+  2. /.dockerenv file exists               -> "breeze"
+  3. /opt/airflow path exists              -> "breeze"
+  4. (default)                             -> "host"
+
+Usage (importable):
+    from ci.prek.context_detect import get_context, get_command
+
+    ctx = get_context()                    # "host" or "breeze"
+    cmd = get_command("run-single-test",   # raises if skill not found
+                      project="providers/amazon",
+                      test_path="providers/amazon/tests/test_s3.py")
+
+Usage (CLI):
+    python scripts/ci/prek/context_detect.py --list
+    python scripts/ci/prek/context_detect.py --generate        # write 
skills.json
+    python scripts/ci/prek/context_detect.py --check           # drift check 
(CI)
+    python scripts/ci/prek/context_detect.py run-single-test 
project=providers/vertica test_path=...
+    python scripts/ci/prek/context_detect.py run-single-test --context breeze 
test_path=...
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import re
+import sys
+from pathlib import Path
+
+REPO_ROOT = Path(__file__).resolve().parents[3]
+
+# Contributing-docs files that contain embedded ``.. agent-skill::`` 
directives.
+# Skills live next to the human-readable workflow they describe so updates are 
atomic.
+AGENT_SKILLS_RST_FILES: list[Path] = [
+    REPO_ROOT / "contributing-docs" / 
"03a_contributors_quick_start_beginners.rst",
+    REPO_ROOT / "contributing-docs" / "08_static_code_checks.rst",
+    REPO_ROOT / "contributing-docs" / "11_documentation_building.rst",
+    REPO_ROOT / "contributing-docs" / "testing" / "unit_tests.rst",
+]
+
+_DIRECTIVE_RE = re.compile(r"^\.\.\s+agent-skill::\s*$")
+_OPTION_RE = re.compile(r"^\s+:([^:]+):\s+(.+)$")
+
+# Committed JSON artifact — agent-facing interface generated from the RST 
sources.
+AGENT_SKILLS_JSON: Path = REPO_ROOT / ".github" / "skills" / 
"airflow-contributor" / "assets" / "skills.json"
+
+
+# ---------------------------------------------------------------------------
+# Context detection
+# ---------------------------------------------------------------------------
+
+
+def get_context() -> str:
+    """Return 'breeze' if running inside the Breeze container, else 'host'.
+
+    Detection priority:
+      1. AIRFLOW_BREEZE_CONTAINER env var
+      2. /.dockerenv file (set by Docker runtime)
+      3. /opt/airflow directory (Breeze mount point)
+      4. default: host
+    """
+    if os.environ.get("AIRFLOW_BREEZE_CONTAINER"):
+        return "breeze"
+    if Path("/.dockerenv").exists():
+        return "breeze"
+    if Path("/opt/airflow").exists():
+        return "breeze"
+    return "host"
+
+
+# ---------------------------------------------------------------------------
+# RST parsing
+# ---------------------------------------------------------------------------
+
+
+def _collect_options(lines: list[str], start: int) -> tuple[dict[str, str], 
int]:
+    opts: dict[str, str] = {}
+    i = start
+    while i < len(lines):
+        m = _OPTION_RE.match(lines[i])
+        if m:
+            opts[m.group(1)] = m.group(2).strip()
+            i += 1
+        elif lines[i].strip() == "" or not lines[i].startswith(" "):
+            break
+        else:
+            i += 1
+    return opts, i
+
+
+def _parse_skills(rst_path: Path) -> list[dict]:
+    """Parse all ``.. agent-skill::`` directives from rst_path.
+
+    Returns a list of skill dicts with fields:
+      id, category, description, steps, prereqs, params, expected_output
+
+    Each step dict has: context ("host"|"breeze"), command, preferred (bool).
+    A skill with both :local: and :fallback: produces two host steps —
+    preferred=True for local, preferred=False for fallback.
+
+    Raises:
+        FileNotFoundError: if rst_path does not exist.
+    """
+    if not rst_path.exists():
+        raise FileNotFoundError(f"agent_skills.rst not found at {rst_path}")
+
+    lines = rst_path.read_text(encoding="utf-8").splitlines()
+    skills: list[dict] = []
+    i = 0
+    while i < len(lines):
+        if _DIRECTIVE_RE.match(lines[i]):
+            opts, i = _collect_options(lines, i + 1)
+            skill_id = opts.get("id", "").strip()
+            if not skill_id:
+                continue
+
+            steps: list[dict] = []
+            local_cmd = opts.get("local", "").strip()
+            fallback_cmd = opts.get("fallback", "").strip()
+            breeze_cmd = opts.get("breeze", "").strip()
+
+            if local_cmd:
+                steps.append({"context": "host", "command": local_cmd, 
"preferred": True})
+            if fallback_cmd:
+                steps.append({"context": "host", "command": fallback_cmd, 
"preferred": False})
+            if breeze_cmd:
+                steps.append({"context": "breeze", "command": breeze_cmd, 
"preferred": True})
+
+            prereqs = [p.strip() for p in opts.get("prereqs", "").split(",") 
if p.strip()]
+            params: dict[str, bool] = {}
+            for raw_param in opts.get("params", "").split(","):
+                raw = raw_param.strip()
+                if ":" in raw:
+                    name, req = raw.split(":", 1)
+                    params[name.strip()] = req.strip() == "required"
+
+            skills.append(
+                {
+                    "id": skill_id,
+                    "category": opts.get("category", ""),
+                    "description": opts.get("description", ""),
+                    "steps": steps,
+                    "prereqs": prereqs,
+                    "params": params,
+                    "expected_output": opts.get("expected-output", ""),
+                }
+            )
+        else:
+            i += 1
+
+    return skills
+
+
+def _parse_skills_from_files(rst_paths: list[Path] = AGENT_SKILLS_RST_FILES) 
-> list[dict]:
+    """Parse ``.. agent-skill::`` directives from all contributing-docs source 
files.
+
+    Merges results across files. Raises ValueError if the same skill id appears
+    in more than one file (duplicate IDs are a configuration error).
+
+    Raises:
+        FileNotFoundError: if any path in rst_paths does not exist.
+        ValueError: if a duplicate skill id is found across files.
+    """
+    all_skills: list[dict] = []
+    seen: dict[str, Path] = {}
+    for path in rst_paths:
+        for skill in _parse_skills(path):
+            sid = skill["id"]
+            if sid in seen:
+                raise ValueError(
+                    f"Duplicate skill id '{sid}' found in {path} (already 
defined in {seen[sid]})"
+                )
+            seen[sid] = path
+            all_skills.append(skill)
+    return all_skills
+
+
+def _find_skill(skill_id: str, skills: list[dict]) -> dict:
+    """Return skill dict by id. Raises KeyError if not found."""
+    for skill in skills:
+        if skill.get("id") == skill_id:
+            return skill
+    known = [s.get("id") for s in skills]
+    raise KeyError(f"Skill '{skill_id}' not found. Available: {known}")
+
+
+# ---------------------------------------------------------------------------
+# JSON artifact: generate and drift-check
+# ---------------------------------------------------------------------------
+
+
+def _load_skills(
+    json_path: Path = AGENT_SKILLS_JSON,
+    rst_paths: list[Path] = AGENT_SKILLS_RST_FILES,
+) -> list[dict]:
+    """Load skills from committed JSON if available, else parse RST.
+
+    JSON is the fast path for agents at runtime. RST is the source of truth
+    and is used as fallback when JSON hasn't been generated yet (e.g. during
+    development before running ``--generate``).
+
+    The JSON fast path is only used when the caller is using the default RST
+    paths. If a caller overrides ``rst_paths`` (e.g. in tests), RST is always
+    parsed so the custom paths are respected.
+    """
+    if rst_paths is AGENT_SKILLS_RST_FILES and json_path.exists():
+        return json.loads(json_path.read_text(encoding="utf-8"))
+    return _parse_skills_from_files(rst_paths)
+
+
+def generate_skills_json(
+    output_path: Path = AGENT_SKILLS_JSON,
+    rst_paths: list[Path] = AGENT_SKILLS_RST_FILES,
+) -> None:
+    """Parse RST sources and write the canonical skills.json artifact.
+
+    Run this after editing any ``.. agent-skill::`` directive and commit the
+    updated skills.json alongside the RST change so the two stay in sync.
+
+    Raises:
+        FileNotFoundError: if any RST source file does not exist.
+        ValueError: if a duplicate skill id is found.
+    """
+    skills = _parse_skills_from_files(rst_paths)
+    output_path.write_text(json.dumps(skills, indent=2) + "\n", 
encoding="utf-8")
+    print(f"Generated {output_path.relative_to(REPO_ROOT)} ({len(skills)} 
skill(s))")
+
+
+def check_skills_json_drift(
+    json_path: Path = AGENT_SKILLS_JSON,
+    rst_paths: list[Path] = AGENT_SKILLS_RST_FILES,
+) -> list[str]:
+    """Return error strings if skills.json doesn't match what RST sources 
produce.
+
+    Used by the prek ``check-agent-skills-drift`` hook. An empty list means the
+    JSON is up to date and the commit can proceed.
+    """
+    errors: list[str] = []
+
+    if not json_path.exists():
+        errors.append(
+            f"skills.json not found at {json_path.relative_to(REPO_ROOT)}. "
+            f"Run: python scripts/ci/prek/context_detect.py --generate"
+        )
+        return errors
+
+    try:
+        current = _parse_skills_from_files(rst_paths)
+    except (FileNotFoundError, ValueError) as exc:
+        errors.append(str(exc))
+        return errors
+
+    stored = json.loads(json_path.read_text(encoding="utf-8"))
+
+    # Normalize both sides to a canonical JSON string for comparison.
+    current_canonical = json.dumps(current, indent=2, sort_keys=True)
+    stored_canonical = json.dumps(stored, indent=2, sort_keys=True)
+
+    if current_canonical != stored_canonical:
+        errors.append(
+            "skills.json is out of sync with RST sources.\n"
+            "  Run: python scripts/ci/prek/context_detect.py --generate\n"
+            f"  Then commit the updated {json_path.relative_to(REPO_ROOT)}"
+        )
+    return errors
+
+
+# ---------------------------------------------------------------------------
+# Command routing
+# ---------------------------------------------------------------------------
+
+
+def get_command(
+    skill_id: str,
+    rst_paths: list[Path] = AGENT_SKILLS_RST_FILES,
+    **kwargs: str,
+) -> str:
+    """Return the correct command for skill_id in the current context.
+
+    Applies {placeholder} substitution from kwargs.
+    If any kwarg value is 'false' and the skill has a non-preferred (fallback)
+    step for the current context, the fallback step is used instead.
+
+    Raises:
+        FileNotFoundError: if any source RST file does not exist.
+        KeyError: if skill_id is not found.
+        ValueError: if a required parameter placeholder is missing from kwargs.
+    """
+    skills = _load_skills(rst_paths=rst_paths)
+    skill = _find_skill(skill_id, skills)
+    ctx = get_context()
+
+    context_steps = [s for s in skill["steps"] if s["context"] in (ctx, 
"either")]
+
+    if not context_steps:
+        available_contexts = {s["context"] for s in skill["steps"]}
+        return (
+            f"Skill '{skill_id}' has no steps for context '{ctx}'. "
+            f"This skill requires: {sorted(available_contexts)}"
+        )
+
+    preferred_steps = [s for s in context_steps if s.get("preferred", True)]
+    fallback_steps = [s for s in context_steps if not s.get("preferred", True)]
+
+    use_fallback = any(v == "false" for v in kwargs.values()) and 
bool(fallback_steps)
+    if use_fallback:
+        selected_step = fallback_steps[0]
+    elif preferred_steps:
+        selected_step = preferred_steps[0]
+    else:
+        selected_step = context_steps[0]
+
+    cmd = selected_step["command"]
+    # Exclude "false" sentinel values from substitution
+    sub_kwargs = {k: v for k, v in kwargs.items() if v != "false"}
+    try:
+        return cmd.format(**sub_kwargs)
+    except KeyError as exc:

Review Comment:
   `get_command()` selects the non-preferred host step whenever *any* kwarg 
value equals the literal string "false" (`use_fallback = any(v == "false" for v 
in kwargs.values())`). This is overly broad: unrelated params (or a legitimate 
value of "false") can unexpectedly flip to the fallback command. Recommend 
making fallback selection explicit and keyed (e.g., only honor a specific flag 
like `system_deps_available=false` or `use_fallback=true`), or encode fallback 
conditions in the skill definition instead of scanning all kwargs.



##########
scripts/ci/prek/context_detect.py:
##########
@@ -0,0 +1,510 @@
+#!/usr/bin/env python
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Runtime host vs Breeze container detection for AI agent skills.
+
+``.. agent-skill::`` directives are embedded directly in contributing-docs RST
+files — the source of truth. A generator extracts them into ``skills.json`` so
+agents can read a static artifact without parsing RST at every invocation.
+
+Source files scanned (see AGENT_SKILLS_RST_FILES):
+  contributing-docs/03a_contributors_quick_start_beginners.rst
+  contributing-docs/08_static_code_checks.rst
+  contributing-docs/11_documentation_building.rst
+  contributing-docs/testing/unit_tests.rst
+
+Generated artifact (see AGENT_SKILLS_JSON):
+  .github/skills/airflow-contributor/assets/skills.json
+
+The JSON is regenerated with ``--generate`` and validated against the RST with
+``--check`` (run as a prek hook so drift is caught at commit time).
+
+Detection priority chain:
+  1. AIRFLOW_BREEZE_CONTAINER env var set  -> "breeze"
+  2. /.dockerenv file exists               -> "breeze"
+  3. /opt/airflow path exists              -> "breeze"
+  4. (default)                             -> "host"
+
+Usage (importable):
+    from ci.prek.context_detect import get_context, get_command
+
+    ctx = get_context()                    # "host" or "breeze"
+    cmd = get_command("run-single-test",   # raises if skill not found
+                      project="providers/amazon",
+                      test_path="providers/amazon/tests/test_s3.py")
+
+Usage (CLI):
+    python scripts/ci/prek/context_detect.py --list
+    python scripts/ci/prek/context_detect.py --generate        # write 
skills.json
+    python scripts/ci/prek/context_detect.py --check           # drift check 
(CI)
+    python scripts/ci/prek/context_detect.py run-single-test 
project=providers/vertica test_path=...
+    python scripts/ci/prek/context_detect.py run-single-test --context breeze 
test_path=...
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import re
+import sys
+from pathlib import Path
+
+REPO_ROOT = Path(__file__).resolve().parents[3]
+
+# Contributing-docs files that contain embedded ``.. agent-skill::`` 
directives.
+# Skills live next to the human-readable workflow they describe so updates are 
atomic.
+AGENT_SKILLS_RST_FILES: list[Path] = [
+    REPO_ROOT / "contributing-docs" / 
"03a_contributors_quick_start_beginners.rst",
+    REPO_ROOT / "contributing-docs" / "08_static_code_checks.rst",
+    REPO_ROOT / "contributing-docs" / "11_documentation_building.rst",
+    REPO_ROOT / "contributing-docs" / "testing" / "unit_tests.rst",
+]
+
+_DIRECTIVE_RE = re.compile(r"^\.\.\s+agent-skill::\s*$")
+_OPTION_RE = re.compile(r"^\s+:([^:]+):\s+(.+)$")
+
+# Committed JSON artifact — agent-facing interface generated from the RST 
sources.
+AGENT_SKILLS_JSON: Path = REPO_ROOT / ".github" / "skills" / 
"airflow-contributor" / "assets" / "skills.json"
+
+
+# ---------------------------------------------------------------------------
+# Context detection
+# ---------------------------------------------------------------------------
+
+
+def get_context() -> str:
+    """Return 'breeze' if running inside the Breeze container, else 'host'.
+
+    Detection priority:
+      1. AIRFLOW_BREEZE_CONTAINER env var
+      2. /.dockerenv file (set by Docker runtime)
+      3. /opt/airflow directory (Breeze mount point)
+      4. default: host
+    """
+    if os.environ.get("AIRFLOW_BREEZE_CONTAINER"):
+        return "breeze"
+    if Path("/.dockerenv").exists():
+        return "breeze"
+    if Path("/opt/airflow").exists():
+        return "breeze"
+    return "host"

Review Comment:
   `get_context()` prefers `AIRFLOW_BREEZE_CONTAINER`, but that env var doesn’t 
appear to be set anywhere else in the repo; Breeze does set `BREEZE=true` 
(e.g., `dev/breeze/src/airflow_breeze/params/shell_params.py`). Consider 
aligning with existing convention by checking `BREEZE` (and/or `/.dockerenv`) 
rather than introducing a new marker that’s easy to forget to export, or ensure 
Breeze actually exports `AIRFLOW_BREEZE_CONTAINER`.



##########
contributing-docs/03a_contributors_quick_start_beginners.rst:
##########
@@ -90,6 +90,14 @@ and launches all Airflow necessary components in those 
terminals. To know more a
 check out this cheat sheet: https://tmuxcheatsheet.com/. Now You can also 
access Airflow UI on your local machine at `http://localhost:28080 
<http://localhost:28080>`_ with user name ``admin`` and password ``admin``. To 
exit breeze, type ``stop_airflow`` in any
 of the tmux panes and hit Enter
 
+.. agent-skill::
+   :id: setup-breeze-environment
+   :category: environment
+   :description: Start the Airflow Breeze development environment. Only run on 
the host — never nest breeze inside breeze.
+   :local: breeze start-airflow
+   :breeze: echo "Already inside Breeze — no setup needed"

Review Comment:
   For `setup-breeze-environment`, the `:breeze:` step returns `echo "Already 
inside Breeze — no setup needed"`, but the description says “preventing 
incorrect action” and tests only assert it’s non-empty. Consider making this 
guidance message more actionable/structured (e.g., explicitly stating that 
Breeze must be started on the host and suggesting the next safe command inside 
Breeze), since this string is part of the agent-facing API.
   ```suggestion
      :breeze: echo "You are already inside the Breeze development container. 
Do NOT run 'breeze' commands here. Instead, use the existing tmux panes or run 
'stop_airflow' followed by 'start_airflow' inside this shell if you need to 
restart Airflow services."
   ```



##########
scripts/tests/ci/prek/test_contributor_scenario_exam.py:
##########
@@ -0,0 +1,301 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Contributor scenario exam.
+
+These tests verify that an AI agent using the skill system gets the *correct*
+command for each realistic contributor situation — using the real 
contributing-docs
+source files, not synthetic fixtures.
+
+Each test is named after the scenario a contributor (or agent) faces. The test
+documents what the agent would do WITHOUT skills (the wrong command) and 
confirms
+that WITH skills it gets the correct command.
+
+This is the evaluation harness described in the GSoC project goals:
+  "Design a testable user scenario or 'exam' that simulates a typical 
contribution
+   workflow to verify that the added skills work as intended."
+"""
+
+from __future__ import annotations
+
+import ci.prek.context_detect as cd
+from ci.prek.context_detect import get_command, list_skills_for_context
+
+# ---------------------------------------------------------------------------
+# Scenario 1: Running a non-DB test on the host
+# ---------------------------------------------------------------------------
+
+
+def test_scenario_host_runs_single_test_with_uv(monkeypatch):
+    """Agent fixed a bug in providers/amazon and wants to run the test.
+
+    WITHOUT skills: agent might run `pytest providers/amazon/tests/... -xvs`
+                    directly on host — fails if MySQL libs are missing, or
+                    resolves the wrong virtualenv.
+    WITH skills:    agent calls run-single-test → gets uv command scoped to
+                    the correct project, which handles the monorepo venv 
correctly.
+    """
+    monkeypatch.setattr(cd, "get_context", lambda: "host")
+
+    cmd = get_command(
+        "run-single-test",
+        project="providers/amazon",
+        test_path="providers/amazon/tests/unit/hooks/test_s3_hook.py",
+    )
+
+    # Must use uv scoped to the provider — not bare pytest
+    assert cmd.startswith("uv run --project providers/amazon"), (
+        f"Expected uv-scoped command on host, got: {cmd}"
+    )
+    assert "providers/amazon/tests/unit/hooks/test_s3_hook.py" in cmd
+    assert "-xvs" in cmd
+
+
+# ---------------------------------------------------------------------------
+# Scenario 2: System dependencies missing — fallback to breeze
+# ---------------------------------------------------------------------------
+
+
+def 
test_scenario_host_falls_back_to_breeze_when_system_deps_missing(monkeypatch):
+    """Agent tries uv but MySQL native libs are missing (e.g. on a fresh 
macOS).
+
+    WITHOUT skills: agent has no way to know there is a fallback — it gives up
+                    or runs breeze incorrectly (breeze shell instead of breeze 
run).
+    WITH skills:    agent passes system_deps_available=false → gets the breeze
+                    fallback command automatically.
+    """
+    monkeypatch.setattr(cd, "get_context", lambda: "host")
+
+    cmd = get_command(
+        "run-single-test",
+        project="providers/amazon",
+        test_path="providers/amazon/tests/unit/hooks/test_s3_hook.py",
+        system_deps_available="false",
+    )
+
+    assert cmd.startswith("breeze run pytest"), (
+        f"Expected breeze fallback on host with missing deps, got: {cmd}"
+    )
+    assert "providers/amazon/tests/unit/hooks/test_s3_hook.py" in cmd
+
+
+# ---------------------------------------------------------------------------
+# Scenario 3: DB test — must never use uv
+# ---------------------------------------------------------------------------
+
+
+def test_scenario_db_test_always_uses_breeze_on_host(monkeypatch):
+    """Agent sees @pytest.mark.db_test and chooses the correct runner.
+
+    WITHOUT skills: agent might try uv — cannot provision a live database,
+                    test fails with connection errors.
+    WITH skills:    agent uses run-db-test → always gets breeze, even on host.
+    """
+    monkeypatch.setattr(cd, "get_context", lambda: "host")
+
+    cmd = get_command(
+        "run-db-test",
+        test_path="airflow-core/tests/models/test_dag.py",
+    )
+
+    assert "breeze run pytest" in cmd, f"DB tests must always use breeze on 
host, got: {cmd}"
+    assert "uv run" not in cmd, "uv must never be used for DB tests"
+    assert "airflow-core/tests/models/test_dag.py" in cmd

Review Comment:
   The DB test scenario uses 
`test_path="airflow-core/tests/models/test_dag.py"`, but the contributing docs 
note this file was relocated under 
`airflow-core/tests/unit/models/test_dag.py`. If the old path no longer exists, 
this scenario will either fail or test the wrong thing. Suggest updating the 
path in the exam to match the current location used in the docs/examples.
   ```suggestion
           test_path="airflow-core/tests/unit/models/test_dag.py",
       )
   
       assert "breeze run pytest" in cmd, f"DB tests must always use breeze on 
host, got: {cmd}"
       assert "uv run" not in cmd, "uv must never be used for DB tests"
       assert "airflow-core/tests/unit/models/test_dag.py" in cmd
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to