This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git
The following commit(s) were added to refs/heads/main by this push:
new 5c6bd32 Add spec-driven build loop and product specs (#250)
5c6bd32 is described below
commit 5c6bd32aa0ffa45089816ae811b07174d0f884fb
Author: Justin Mclean <[email protected]>
AuthorDate: Wed May 27 08:04:06 2026 +1000
Add spec-driven build loop and product specs (#250)
* feat(spec-loop): add spec-driven build loop and product specs
Introduce a spec-driven development loop under tools/spec-loop/ that
develops and maintains the framework against a written description of
what it does.
- specs/ describes the actual functionality (the four live modes, the
security lifecycle, privacy gate, sandbox, CVE tooling, adoption,
adapters, meta tooling) as topic-named files, separate from the RFCs.
- IMPLEMENTATION_PLAN.md holds the prioritised gaps as work items.
- loop.sh runs plan / build / update / consolidate. Build implements one
work item per branch (one work item = one branch = one PR) and never
pushes or opens a PR; the human does that. update back-fills specs from
functionality contributed outside the loop.
- docs/spec-driven-development.md explains how it works.
Generated-by: Claude (Opus 4.7)
* docs(skills): require an eval suite for every skill
Add to AGENTS.md § Reusable skills the convention that every skill ships
a behavioural eval suite under tools/skill-evals/evals/<skill-name>/
covering each pipeline step with fixture cases — a skill PR without a
matching eval is incomplete. Mirror the rule into the spec-loop build
prompt and operational context so the loop enforces it, and record the
"back-fill missing eval suites" gap in the plan.
Generated-by: Claude (Opus 4.7)
* fix(spec-loop): default loop base to the current branch, not spec-driven
SPEC_LOOP_BASE defaulted to spec-driven — the throwaway scaffolding
branch — which is a dangling reference the moment this PR merges. Default
it instead to the branch the loop is started on (fall back to main if
detached), and point the work-item PR examples at main.
Generated-by: Claude (Opus 4.7)
* fix(spec-loop): satisfy markdownlint and doctoc
- Tag the push/PR command fences in PROMPT_build.md and PROMPT_update.md
with `text` so markdownlint MD040 (fenced-code-language) passes.
- Exclude tools/spec-loop/ from the doctoc hook: the specs carry YAML
frontmatter and the prompts are short single-purpose docs — same
rationale as the existing skill and skill-evals exclusions.
Generated-by: Claude (Opus 4.7)
* docs(spec-loop): explain --dangerously-skip-permissions in the security
model
The loop runs the agent headless with --dangerously-skip-permissions.
Document how that fits the layered sandbox: the flag bypasses the agent
permission layer (.claude/settings.json deny/ask) but NOT the OS sandbox
(clean-env + filesystem/network), which stays the real boundary — matching
the flag's own "sandboxes only" guidance.
- Add a "Security and the dangerously-skip-permissions flag" section to
docs/spec-driven-development.md (what each sandbox layer does, why the
loop stays safe: run-in-sandbox, no credentials, structural containment).
- Add a Security callout to tools/spec-loop/README.md and a SECURITY
header block to loop.sh.
- Harden the invocation: --disallowedTools "Bash(git push *)" "Bash(gh *)"
as defence in depth so a stray push/PR cannot reach the remote.
Generated-by: Claude (Opus 4.7)
* fix(spec-loop): make plan consolidation hysteretic and not livelock
The build loop switched to a consolidation round whenever the plan
exceeded 500 lines, but the consolidate beat must preserve every planned
work item. A plan that was long because of pending work (not stale
history) could therefore never drop below the threshold, so every
subsequent build iteration re-consolidated forever and never built.
- Consolidate at most once (CONSOLIDATE_TRIED latch); if still over after
one round, build anyway and note that the length is planned work. The
latch resets when the plan drops back under the limit.
- Give PROMPT_consolidate a real target (~300 lines, below the 500
trigger) for hysteresis, while preserving every planned work item.
- Make the threshold configurable via SPEC_LOOP_PLAN_MAX (default 500)
and document it.
Generated-by: Claude (Opus 4.7)
* loop improvements
* so we can run this before merging
* use git checkout not git swicth so it works with slightly older versions
of git
* read tooling from control branch, build work items on main
* remove "spec/" on branches
* fix(spec-loop): correct update-mode note, arg validation, spinner, deny
syntax
Address self-review findings on loop.sh:
- update mode no longer receives the build-only "do NOT edit specs" note
when launched from a branch != BASE; it is now told to author the updated
specs on the work branch (update is the one beat that writes specs).
- reject a non-numeric iteration count instead of erroring to stderr and
silently treating it as 0 (i.e. running unbounded).
- spinner indexes the braille frames as an array rather than by byte offset,
so it renders correctly under a C/POSIX locale (the clean-env sandbox
case).
- correct --disallowedTools to the colon wildcard form
(Bash(git push:*) / Bash(gh:*)) so the defense-in-depth deny actually
matches git push / gh invocations.
Generated-by: Claude Code (Opus 4.7)
* fix an endless loop as you need to check local branches unpushed as well
---
tools/spec-loop/AGENTS.md | 5 ++
tools/spec-loop/PROMPT_build.md | 22 ++++++---
tools/spec-loop/PROMPT_plan.md | 18 ++++---
tools/spec-loop/loop.sh | 101 +++++++++++++++++++++++-----------------
4 files changed, 89 insertions(+), 57 deletions(-)
diff --git a/tools/spec-loop/AGENTS.md b/tools/spec-loop/AGENTS.md
index 7a839b8..1c4b776 100644
--- a/tools/spec-loop/AGENTS.md
+++ b/tools/spec-loop/AGENTS.md
@@ -60,6 +60,11 @@ commit.
- The **`update`** beat (specs fell behind code others contributed)
branches `sync-specs` and edits `specs/` **only** — it documents
reality, it never changes a skill, tool, or doc outside the spec dir.
+- The runner feeds each iteration **both** the open PRs and the local
+ work-item branches as in-flight work. Because the loop never pushes, a
+ built-but-un-pushed item exists only as a local branch with no PR, so the
+ local-branch list (not just open PRs) is what prevents the loop from
+ rebuilding the same item every iteration.
## Hard limits (governance — do not cross)
diff --git a/tools/spec-loop/PROMPT_build.md b/tools/spec-loop/PROMPT_build.md
index e612e37..353c5f2 100644
--- a/tools/spec-loop/PROMPT_build.md
+++ b/tools/spec-loop/PROMPT_build.md
@@ -11,19 +11,25 @@ Context to load first:
applies (commit trailers, placeholder convention, confidentiality).
- `tools/spec-loop/IMPLEMENTATION_PLAN.md` — the prioritised work items.
- The appended **Open pull-request context** block from the runner.
+- The appended **Local work-item branches** block from the runner. The
+ loop never pushes, so a work item it already built shows up here, not in
+ the PR context above.
- Only the spec(s) and source files relevant to the chosen work item —
do not read the whole tree.
Steps:
-1. Read the appended **Open pull-request context**. Treat open PRs as
+1. Read the appended **Open pull-request context** and **Local work-item
+ branches**. Treat both open PRs and existing local work-item branches as
in-flight work. Pick the single highest-priority work item from
`IMPLEMENTATION_PLAN.md`. If a **Tooling source** block is appended
below, read the plan from the control branch as it shows
(`git show <ref>:tools/spec-loop/IMPLEMENTATION_PLAN.md`), not from the
- working tree — the tree is on the integration base, which need not carry
- the plan. Pick an item not already substantially covered by an open PR.
- One only.
+ working tree, which is on the integration base and need not carry the
+ plan. Pick an item not already substantially covered by an open PR and
+ not already built as a local work-item branch (the loop never pushes, so
+ a built item lives only as a local branch until a human pushes it). One
+ only.
2. **Create its branch off the integration base**, then switch to it:
`git checkout -b <slug>` where `<slug>` is the work item's branch — the
bare slug, **no `spec/` or other prefix** (e.g.
@@ -66,9 +72,11 @@ gh pr create --web --base <integration-base> --head <slug> \
Rules:
- One work item per iteration. Do not bundle.
-- Do not duplicate in-flight work from open PRs. If the highest-priority
- plan item is covered by an open PR, skip it and choose the next
- uncovered item.
+- Do not duplicate in-flight work. If the highest-priority plan item is
+ already covered by an open PR or already exists as a local work-item
+ branch, skip it and choose the next uncovered item. Checking local
+ branches, not just open PRs, is what keeps the loop from rebuilding the
+ same item every iteration.
- If a work item is blocked, note why in its spec's `Known gaps` and pick
the next item instead.
- Stay inside the sandbox; never edit `.claude/settings.json`; never add a
diff --git a/tools/spec-loop/PROMPT_plan.md b/tools/spec-loop/PROMPT_plan.md
index afc4c57..709dfd4 100644
--- a/tools/spec-loop/PROMPT_plan.md
+++ b/tools/spec-loop/PROMPT_plan.md
@@ -12,6 +12,8 @@ Context to load first:
- `tools/spec-loop/specs/*` — the functional description of the product.
- `tools/spec-loop/IMPLEMENTATION_PLAN.md` (if present; may be stale).
- The appended **Open pull-request context** block from the runner.
+- The appended **Local work-item branches** block from the runner. Built
+ but un-pushed work items live here, not in the PR context.
Steps:
@@ -19,9 +21,12 @@ Steps:
actual code it names in **Where it lives** (`.claude/skills/`,
`tools/`, `docs/`). You may use parallel subagents for reading. Do NOT
assume something is missing — confirm with a code search first.
-2. Read the appended **Open pull-request context**. Treat open PRs as
- in-flight work. If an apparent gap is already substantially covered by
- an open PR (including draft PRs), do not add it as a planned work item.
+2. Read the appended **Open pull-request context** and **Local work-item
+ branches**. Treat both open PRs and existing local work-item branches as
+ in-flight work. If an apparent gap is already substantially covered by an
+ open PR (including draft PRs) or already built on a local work-item
+ branch, do not add it as a planned work item. The loop never pushes, so a
+ built item may exist only as a local branch with no PR yet.
3. For each spec, identify the **gaps**: a `proposed` area with no skill,
a documented step that drifted from the code, a missing test, a
`Known gaps` item. Each gap is a candidate work item.
@@ -37,8 +42,9 @@ Rules:
- Plan only. No edits to skills, tools, or docs. No commits in this beat.
- Keep the plan prioritised and concise; one work item = one branch = one
PR.
-- Do not duplicate in-flight work from open PRs. If a stale existing plan
- item is now covered by an open PR, remove it or mark it as in-flight
- rather than leaving it available for the build beat.
+- Do not duplicate in-flight work. If a stale existing plan item is now
+ covered by an open PR or already built on a local work-item branch, remove
+ it or mark it as in-flight rather than leaving it available for the build
+ beat.
- Treat `tools/` as the standard library — prefer extending an existing
tool over a new ad-hoc one.
diff --git a/tools/spec-loop/loop.sh b/tools/spec-loop/loop.sh
index 8e97a8c..ad96164 100755
--- a/tools/spec-loop/loop.sh
+++ b/tools/spec-loop/loop.sh
@@ -21,6 +21,12 @@
# .claude/settings.json `ask` — they are the human's step. The loop
# ends at a local commit and the build prompt prints the human-run
# push + `gh pr create --web` commands.
+# * NO REDOING BUILT WORK: because the loop never pushes, a work item it
+# already built exists only as a LOCAL BRANCH and has no open PR. Each
+# iteration therefore feeds the agent BOTH the open PRs and the local
+# work-item branches as in-flight work. Without the local-branch signal
+# the agent would re-pick the same top-priority plan item every
+# iteration and rebuild it forever (an endless loop).
#
# SECURITY — read before running:
# This loop runs the agent with `--dangerously-skip-permissions`, which
@@ -175,6 +181,56 @@ open_pr_context() {
fi
}
+# Local work-item branches the loop has already built. This is the companion
+# to open_pr_context: the loop never pushes, so a freshly built item has NO
+# open PR and is invisible to the PR check above. Listing it here as in-flight
+# work is what stops the agent re-picking the same top-priority plan item and
+# rebuilding it on a new branch every iteration. Reads refs only, so it is
+# correct regardless of which branch is currently checked out.
+local_branch_context() {
+ echo ""
+ echo "## Local work-item branches"
+ echo ""
+ echo "The runner collected this immediately before the iteration. This
loop"
+ echo "never pushes and never opens a PR, so a work item it has already
built"
+ echo "exists ONLY as a local branch and will NOT appear in the open-PR"
+ echo "context above. Treat every branch listed here as work that is
already"
+ echo "built or in flight: do not add a plan item, and do not pick a build"
+ echo "item, whose slug matches one of these branches or whose change one
of"
+ echo "them already carries. Checking these branches (not just open PRs) is"
+ echo "what keeps the loop from rebuilding the same item every iteration."
+ echo ""
+
+ local have_base=false
+ if git rev-parse --verify --quiet "refs/heads/$BASE" >/dev/null 2>&1; then
+ have_base=true
+ fi
+
+ # Every local branch except the integration base and the control branch
+ # (where the tooling lives); those two are never work-item branches.
+ local branches
+ branches="$(git for-each-ref --format='%(refname:short)' refs/heads/ \
+ | grep -vxF "$BASE" \
+ | grep -vxF "$TOOLING_REF")"
+
+ if [ -z "$branches" ]; then
+ echo "- No local work-item branches found."
+ return 0
+ fi
+
+ local b subject ahead
+ while IFS= read -r b; do
+ [ -n "$b" ] || continue
+ subject="$(git log -1 --format='%s' "$b" 2>/dev/null)"
+ if [ "$have_base" = true ]; then
+ ahead="$(git rev-list --count "$BASE..$b" 2>/dev/null)"
+ echo "- ${b} (${ahead:-?} commit(s) ahead of ${BASE}): ${subject}"
+ else
+ echo "- ${b}: ${subject}"
+ fi
+ done <<< "$branches"
+}
+
while true; do
if [ "$MAX_ITERATIONS" -gt 0 ] && [ "$ITERATION" -ge "$MAX_ITERATIONS" ];
then
echo "Reached max iterations: $MAX_ITERATIONS"; break
@@ -228,6 +284,7 @@ while true; do
rm -f "$PROMPT_WITH_CONTEXT"; break
fi
open_pr_context >> "$PROMPT_WITH_CONTEXT"
+ local_branch_context >> "$PROMPT_WITH_CONTEXT"
if [ "$BUILD_ITERATION" = true ]; then
# The work-item branch forks off BASE (e.g. main), which need not
@@ -261,32 +318,6 @@ while true; do
} >> "$PROMPT_WITH_CONTEXT"
fi
- # Freshness check: refuse to fork off a stale base. Fetch the base's
- # tracking branch and verify BASE is not behind it. Forking off a stale
- # local base is how a loop re-does work that's already merged upstream:
- # the agent sees "no such file" locally, dutifully re-creates it, and
- # the resulting branch collides with the merged PR's source branch on
- # the remote.
- BASE_UPSTREAM="$(git rev-parse --abbrev-ref "${BASE}@{upstream}"
2>/dev/null || true)"
- if [ -n "$BASE_UPSTREAM" ]; then
- BASE_REMOTE="${BASE_UPSTREAM%%/*}"
- if ! git fetch --quiet "$BASE_REMOTE" "$BASE" 2>/dev/null; then
- echo "⚠ Could not fetch '$BASE_REMOTE' — freshness check
skipped (network or auth issue)." >&2
- else
- BEHIND_BY="$(git rev-list --count "${BASE}..${BASE_UPSTREAM}"
2>/dev/null || echo 0)"
- if [ "$BEHIND_BY" -gt 0 ]; then
- echo "✗ Base '$BASE' is $BEHIND_BY commit(s) behind
'$BASE_UPSTREAM'." >&2
- echo " Fast-forward before re-running:" >&2
- echo " git checkout $BASE && git merge --ff-only
$BASE_UPSTREAM" >&2
- echo " Forking off a stale base re-does merged work — the
new branch" >&2
- echo " may collide with one already on the remote for the
same change." >&2
- rm -f "$PROMPT_WITH_CONTEXT"; break
- fi
- fi
- else
- echo "⚠ Base '$BASE' has no upstream tracking branch — freshness
check skipped." >&2
- fi
-
# Check out the base now — right before the agent runs, not earlier —
# so the reads above came from the control branch. The agent then
# forks its own <slug> branch off this base.
@@ -330,24 +361,6 @@ while true; do
# Report the work-item branch the agent produced, by name, so you know
# exactly what to push.
if [ "$CUR_BRANCH" != "$BASE" ] && [ "$CUR_BRANCH" != "$TOOLING_REF"
]; then
- # Branch-name collision check: a remote branch with the same name
- # often means the agent just re-did work that already shipped under
- # this slug (a merged PR's source branch typically lingers on the
- # remote). Warn loudly — pushing would either be rejected or,
worse,
- # overwrite the merged history. Check every configured remote, not
- # just origin: a fork-based workflow has the lineage on `upstream`.
- COLLISION_FOUND=false
- for remote in $(git remote); do
- if git ls-remote --heads --exit-code "$remote" "$CUR_BRANCH"
>/dev/null 2>&1; then
- echo "⚠ Remote '$remote' already has a branch named
'$CUR_BRANCH'." >&2
- COLLISION_FOUND=true
- fi
- done
- if [ "$COLLISION_FOUND" = true ]; then
- echo " Likely the source branch of a PR that already shipped
under this slug." >&2
- echo " Inspect before pushing — do not push blind:" >&2
- echo " git fetch --all && git log --oneline --all --
<changed-file>" >&2
- fi
echo "[ new branch ] $CUR_BRANCH (forked off $BASE)"
echo " push it with: git push -u origin $CUR_BRANCH"
else