(airflow) branch main updated: prepare-providers-doc skill: batch classification to cut sub-agents (#68653)

potiuk Wed, 17 Jun 2026 02:25:45 -0700

This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git



The following commit(s) were added to refs/heads/main by this push:
     new 272904f4c03 prepare-providers-doc skill: batch classification to cut 
sub-agents (#68653)
272904f4c03 is described below

commit 272904f4c0353080d13366149cab983f006fc4cd
Author: Jarek Potiuk <[email protected]>
AuthorDate: Wed Jun 17 05:25:29 2026 -0400

    prepare-providers-doc skill: batch classification to cut sub-agents (#68653)
    
    The skill classified provider changes by spawning one Explore sub-agent per
    PR. On a normal release wave (dozens of providers, hundreds of commits) that
    is hundreds of sub-agents — the main reason the run is so long and 
token-heavy.
    
    Replace the per-PR fan-out with a volume-aware strategy: classify inline 
when
    few commits remain (≲15), otherwise spawn one sub-agent per provider that
    classifies that provider's whole remaining commit list in a single pass
    (chunking only providers with >25 commits). Multi-provider PRs are already
    classified independently per provider, so the provider is the natural batch
    unit and the breaking-change checklist is amortized across its commits 
instead
    of paid per commit. This takes the sub-agent count from O(commits) down to
    roughly O(providers) — often zero for incremental updates.
---
 .../prepare-providers-documentation/SKILL.md       | 75 ++++++++++++++--------
 1 file changed, 49 insertions(+), 26 deletions(-)

diff --git a/.agents/skills/prepare-providers-documentation/SKILL.md 
b/.agents/skills/prepare-providers-documentation/SKILL.md
index a02798ac499..688160d3a2a 100644
--- a/.agents/skills/prepare-providers-documentation/SKILL.md
+++ b/.agents/skills/prepare-providers-documentation/SKILL.md
@@ -4,7 +4,8 @@ description: >
   Replace the manual commit-by-commit classification step in
   `breeze release-management prepare-provider-documentation` with AI-driven
   classification. For each provider with pending changes, analyze every PR
-  (using sub-agents per PR), pay special attention to potentially breaking
+  (batched into one sub-agent per provider, not one per PR), pay special
+  attention to potentially breaking
   changes by inspecting the actual diff, scope multi-provider PRs to the
   current provider's slice, ask the release manager when uncertain, and
   apply version bumps + changelog entries. Use during the regular provider
@@ -190,7 +191,7 @@ each commit. Note that some old providers also have legacy 
paths under
 `provider_details.possible_old_provider_paths` semantics by checking the
 provider's `provider.yaml` history if needed).
 
-### Phase 3 — Classify each PR with sub-agents
+### Phase 3 — Classify the PRs (inline, or batched per-provider sub-agents)
 
 For each commit, classify it into one of:
 
@@ -218,21 +219,39 @@ Before spawning a sub-agent, apply the same fast 
heuristics breeze uses
 
 Note these classifications and move on — no sub-agent needed.
 
-#### Sub-agent per PR for the rest
-
-For the remaining commits, spawn sub-agents in parallel (batches of 5–10 to
-avoid context pressure). Use the `Explore` agent type — they need read-only
-access. Brief each sub-agent with:
+#### Classify the rest — batched per provider, not one agent per PR
+
+Classification is the token-heavy part of this skill, so spend sub-agents
+sparingly. Do **not** spawn one sub-agent per PR — that is one agent per
+commit and balloons to hundreds of agents on a normal release wave. Pick the
+smallest fan-out that fits the volume:
+
+- **Few commits remain (≲ 15 across all providers) → classify inline.** Read
+  each PR and its provider-scoped diff yourself, in this context. Spawn no
+  sub-agents at all.
+- **More than that → one sub-agent per provider.** Each agent classifies that
+  provider's *entire* remaining commit list in a single pass. This is the
+  natural unit: multi-provider PRs are classified independently per provider
+  anyway (see Cross-Cutting Rules), and one provider-scoped agent amortizes
+  the breaking-change checklist across all of that provider's commits instead
+  of paying a fresh agent spin-up per commit. Only split a provider across
+  more than one agent when its remaining list is large (> ~25 commits) — chunk
+  it then. This keeps the sub-agent count at roughly the number of providers
+  with pending changes, not the number of commits.
+
+Use the `Explore` agent type — they need read-only access. Brief each
+sub-agent with its provider and the whole batch of commits it owns:
 
 ```
-Classify a single Apache Airflow provider PR.
+Classify a batch of Apache Airflow provider PRs for ONE provider.
 
-PR:        #<NNNN>
-Commit:    <full-hash>
-Subject:   <subject>
 Provider:  <provider-id>      (path: providers/<provider-path>/)
+Commits to classify (<N>) — one row per PR:
+  #<NNNN>  <full-hash>  <subject>
+  #<MMMM>  <full-hash>  <subject>
+  …(this provider's full remaining list)
 
-Tasks:
+For EACH commit above:
 1. Read the PR's title, body, and labels:
    `gh pr view <NNNN> --json title,body,labels,files`
 2. Read the diff for the slice of the PR that touched
@@ -250,14 +269,14 @@ Tasks:
                     changes, type-hint cleanups, no user-visible behavior
    - skip:          only tests/examples/CI for this provider's slice
    - min_airflow_bump: explicitly bumps the minimum Airflow version pin
-4. Output strictly:
-   CLASSIFICATION: <one of: 
documentation|bugfix|feature|breaking|misc|skip|min_airflow_bump>
-   CONFIDENCE: <high|medium|low>
-   JUSTIFICATION: <one sentence>
-   BREAKING_RISK: <none|maybe|yes>     (set "maybe" when the diff has any
-                                         signal from the breaking-change
-                                         checklist, even if you think the
-                                         author intended otherwise)
+4. Set BREAKING_RISK to "maybe" whenever the diff has any signal from the
+   breaking-change checklist below, even if you think the author intended
+   otherwise.
+
+Output one row per commit and nothing else, in this exact pipe format
+(<N> rows for <N> commits):
+
+   #<NNNN> | 
<documentation|bugfix|feature|breaking|misc|skip|min_airflow_bump> | 
<high|medium|low> | <none|maybe|yes> | <one-sentence justification>
 
 Breaking-change checklist (any of these → BREAKING_RISK >= maybe; usually
 breaking unless clearly behind a deprecation shim):
@@ -290,7 +309,8 @@ that removes a public method is breaking. A PR titled 
"BREAKING: rename
 foo" that only renames a private symbol is not.
 ```
 
-Collect all sub-agent results into a table.
+Collect every sub-agent's rows (and any you classified inline) into one
+classification table for Phase 3.5.
 
 ### Phase 3.5 — Confirm with the release manager
 
@@ -542,9 +562,12 @@ If there are zero new commits for a provider, skip it.
 ### Incremental Phase 3 — Classify the new commits
 
 Same logic as Phase 3 of the initial run — including the auto-classify
-heuristic for docs/test-only changes and the sub-agent-per-PR pattern with
-the breaking-change checklist. The output is a per-provider table mapping
-each new commit hash to a classification.
+heuristic for docs/test-only changes and the batched classification (inline
+when few commits remain, otherwise one sub-agent per provider) with the
+breaking-change checklist. Incremental runs usually have only a handful of new
+commits, so prefer classifying them inline rather than spawning any sub-agent.
+The output is a per-provider table mapping each new commit hash to a
+classification.
 
 ### Incremental Phase 3.5 — Decide whether to escalate the version bump
 
@@ -612,8 +635,8 @@ When a single PR touches several providers (e.g.
 `Add Python 3.14 Support (#63520)` touches dozens), classify it
 **independently per provider**. The same PR can be `feature` in one provider
 (a real new capability) and `misc` in another (just a constraint bump in
-`pyproject.toml`). Always scope the sub-agent's diff inspection to the
-current provider's path:
+`pyproject.toml`). Always scope the diff inspection (whether inline or in a
+per-provider sub-agent) to the current provider's path:
 
 ```bash
 gh pr diff <NNNN> -- 'providers/<provider-path>/**'

(airflow) branch main updated: prepare-providers-doc skill: batch classification to cut sub-agents (#68653)

Reply via email to