This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git


The following commit(s) were added to refs/heads/main by this push:
     new 5c6bd32  Add spec-driven build loop and product specs (#250)
5c6bd32 is described below

commit 5c6bd32aa0ffa45089816ae811b07174d0f884fb
Author: Justin Mclean <[email protected]>
AuthorDate: Wed May 27 08:04:06 2026 +1000

    Add spec-driven build loop and product specs (#250)
    
    * feat(spec-loop): add spec-driven build loop and product specs
    
    Introduce a spec-driven development loop under tools/spec-loop/ that
    develops and maintains the framework against a written description of
    what it does.
    
    - specs/ describes the actual functionality (the four live modes, the
      security lifecycle, privacy gate, sandbox, CVE tooling, adoption,
      adapters, meta tooling) as topic-named files, separate from the RFCs.
    - IMPLEMENTATION_PLAN.md holds the prioritised gaps as work items.
    - loop.sh runs plan / build / update / consolidate. Build implements one
      work item per branch (one work item = one branch = one PR) and never
      pushes or opens a PR; the human does that. update back-fills specs from
      functionality contributed outside the loop.
    - docs/spec-driven-development.md explains how it works.
    
    Generated-by: Claude (Opus 4.7)
    
    * docs(skills): require an eval suite for every skill
    
    Add to AGENTS.md § Reusable skills the convention that every skill ships
    a behavioural eval suite under tools/skill-evals/evals/<skill-name>/
    covering each pipeline step with fixture cases — a skill PR without a
    matching eval is incomplete. Mirror the rule into the spec-loop build
    prompt and operational context so the loop enforces it, and record the
    "back-fill missing eval suites" gap in the plan.
    
    Generated-by: Claude (Opus 4.7)
    
    * fix(spec-loop): default loop base to the current branch, not spec-driven
    
    SPEC_LOOP_BASE defaulted to spec-driven — the throwaway scaffolding
    branch — which is a dangling reference the moment this PR merges. Default
    it instead to the branch the loop is started on (fall back to main if
    detached), and point the work-item PR examples at main.
    
    Generated-by: Claude (Opus 4.7)
    
    * fix(spec-loop): satisfy markdownlint and doctoc
    
    - Tag the push/PR command fences in PROMPT_build.md and PROMPT_update.md
      with `text` so markdownlint MD040 (fenced-code-language) passes.
    - Exclude tools/spec-loop/ from the doctoc hook: the specs carry YAML
      frontmatter and the prompts are short single-purpose docs — same
      rationale as the existing skill and skill-evals exclusions.
    
    Generated-by: Claude (Opus 4.7)
    
    * docs(spec-loop): explain --dangerously-skip-permissions in the security 
model
    
    The loop runs the agent headless with --dangerously-skip-permissions.
    Document how that fits the layered sandbox: the flag bypasses the agent
    permission layer (.claude/settings.json deny/ask) but NOT the OS sandbox
    (clean-env + filesystem/network), which stays the real boundary — matching
    the flag's own "sandboxes only" guidance.
    
    - Add a "Security and the dangerously-skip-permissions flag" section to
      docs/spec-driven-development.md (what each sandbox layer does, why the
      loop stays safe: run-in-sandbox, no credentials, structural containment).
    - Add a Security callout to tools/spec-loop/README.md and a SECURITY
      header block to loop.sh.
    - Harden the invocation: --disallowedTools "Bash(git push *)" "Bash(gh *)"
      as defence in depth so a stray push/PR cannot reach the remote.
    
    Generated-by: Claude (Opus 4.7)
    
    * fix(spec-loop): make plan consolidation hysteretic and not livelock
    
    The build loop switched to a consolidation round whenever the plan
    exceeded 500 lines, but the consolidate beat must preserve every planned
    work item. A plan that was long because of pending work (not stale
    history) could therefore never drop below the threshold, so every
    subsequent build iteration re-consolidated forever and never built.
    
    - Consolidate at most once (CONSOLIDATE_TRIED latch); if still over after
      one round, build anyway and note that the length is planned work. The
      latch resets when the plan drops back under the limit.
    - Give PROMPT_consolidate a real target (~300 lines, below the 500
      trigger) for hysteresis, while preserving every planned work item.
    - Make the threshold configurable via SPEC_LOOP_PLAN_MAX (default 500)
      and document it.
    
    Generated-by: Claude (Opus 4.7)
    
    * loop improvements
    
    * so we can run this before merging
    
    * use git checkout not git swicth so it works with slightly older versions 
of git
    
    * read tooling from control branch, build work items on main
    
    * remove "spec/" on branches
    
    * fix(spec-loop): correct update-mode note, arg validation, spinner, deny 
syntax
    
    Address self-review findings on loop.sh:
    
    - update mode no longer receives the build-only "do NOT edit specs" note
      when launched from a branch != BASE; it is now told to author the updated
      specs on the work branch (update is the one beat that writes specs).
    - reject a non-numeric iteration count instead of erroring to stderr and
      silently treating it as 0 (i.e. running unbounded).
    - spinner indexes the braille frames as an array rather than by byte offset,
      so it renders correctly under a C/POSIX locale (the clean-env sandbox 
case).
    - correct --disallowedTools to the colon wildcard form
      (Bash(git push:*) / Bash(gh:*)) so the defense-in-depth deny actually
      matches git push / gh invocations.
    
    Generated-by: Claude Code (Opus 4.7)
    
    * fix an endless loop as you need to check local branches unpushed as well
---
 tools/spec-loop/AGENTS.md       |   5 ++
 tools/spec-loop/PROMPT_build.md |  22 ++++++---
 tools/spec-loop/PROMPT_plan.md  |  18 ++++---
 tools/spec-loop/loop.sh         | 101 +++++++++++++++++++++++-----------------
 4 files changed, 89 insertions(+), 57 deletions(-)

diff --git a/tools/spec-loop/AGENTS.md b/tools/spec-loop/AGENTS.md
index 7a839b8..1c4b776 100644
--- a/tools/spec-loop/AGENTS.md
+++ b/tools/spec-loop/AGENTS.md
@@ -60,6 +60,11 @@ commit.
 - The **`update`** beat (specs fell behind code others contributed)
   branches `sync-specs` and edits `specs/` **only** — it documents
   reality, it never changes a skill, tool, or doc outside the spec dir.
+- The runner feeds each iteration **both** the open PRs and the local
+  work-item branches as in-flight work. Because the loop never pushes, a
+  built-but-un-pushed item exists only as a local branch with no PR, so the
+  local-branch list (not just open PRs) is what prevents the loop from
+  rebuilding the same item every iteration.
 
 ## Hard limits (governance — do not cross)
 
diff --git a/tools/spec-loop/PROMPT_build.md b/tools/spec-loop/PROMPT_build.md
index e612e37..353c5f2 100644
--- a/tools/spec-loop/PROMPT_build.md
+++ b/tools/spec-loop/PROMPT_build.md
@@ -11,19 +11,25 @@ Context to load first:
   applies (commit trailers, placeholder convention, confidentiality).
 - `tools/spec-loop/IMPLEMENTATION_PLAN.md` — the prioritised work items.
 - The appended **Open pull-request context** block from the runner.
+- The appended **Local work-item branches** block from the runner. The
+  loop never pushes, so a work item it already built shows up here, not in
+  the PR context above.
 - Only the spec(s) and source files relevant to the chosen work item —
   do not read the whole tree.
 
 Steps:
 
-1. Read the appended **Open pull-request context**. Treat open PRs as
+1. Read the appended **Open pull-request context** and **Local work-item
+   branches**. Treat both open PRs and existing local work-item branches as
    in-flight work. Pick the single highest-priority work item from
    `IMPLEMENTATION_PLAN.md`. If a **Tooling source** block is appended
    below, read the plan from the control branch as it shows
    (`git show <ref>:tools/spec-loop/IMPLEMENTATION_PLAN.md`), not from the
-   working tree — the tree is on the integration base, which need not carry
-   the plan. Pick an item not already substantially covered by an open PR.
-   One only.
+   working tree, which is on the integration base and need not carry the
+   plan. Pick an item not already substantially covered by an open PR and
+   not already built as a local work-item branch (the loop never pushes, so
+   a built item lives only as a local branch until a human pushes it). One
+   only.
 2. **Create its branch off the integration base**, then switch to it:
    `git checkout -b <slug>` where `<slug>` is the work item's branch — the
    bare slug, **no `spec/` or other prefix** (e.g.
@@ -66,9 +72,11 @@ gh pr create --web --base <integration-base> --head <slug> \
 Rules:
 
 - One work item per iteration. Do not bundle.
-- Do not duplicate in-flight work from open PRs. If the highest-priority
-  plan item is covered by an open PR, skip it and choose the next
-  uncovered item.
+- Do not duplicate in-flight work. If the highest-priority plan item is
+  already covered by an open PR or already exists as a local work-item
+  branch, skip it and choose the next uncovered item. Checking local
+  branches, not just open PRs, is what keeps the loop from rebuilding the
+  same item every iteration.
 - If a work item is blocked, note why in its spec's `Known gaps` and pick
   the next item instead.
 - Stay inside the sandbox; never edit `.claude/settings.json`; never add a
diff --git a/tools/spec-loop/PROMPT_plan.md b/tools/spec-loop/PROMPT_plan.md
index afc4c57..709dfd4 100644
--- a/tools/spec-loop/PROMPT_plan.md
+++ b/tools/spec-loop/PROMPT_plan.md
@@ -12,6 +12,8 @@ Context to load first:
 - `tools/spec-loop/specs/*` — the functional description of the product.
 - `tools/spec-loop/IMPLEMENTATION_PLAN.md` (if present; may be stale).
 - The appended **Open pull-request context** block from the runner.
+- The appended **Local work-item branches** block from the runner. Built
+  but un-pushed work items live here, not in the PR context.
 
 Steps:
 
@@ -19,9 +21,12 @@ Steps:
    actual code it names in **Where it lives** (`.claude/skills/`,
    `tools/`, `docs/`). You may use parallel subagents for reading. Do NOT
    assume something is missing — confirm with a code search first.
-2. Read the appended **Open pull-request context**. Treat open PRs as
-   in-flight work. If an apparent gap is already substantially covered by
-   an open PR (including draft PRs), do not add it as a planned work item.
+2. Read the appended **Open pull-request context** and **Local work-item
+   branches**. Treat both open PRs and existing local work-item branches as
+   in-flight work. If an apparent gap is already substantially covered by an
+   open PR (including draft PRs) or already built on a local work-item
+   branch, do not add it as a planned work item. The loop never pushes, so a
+   built item may exist only as a local branch with no PR yet.
 3. For each spec, identify the **gaps**: a `proposed` area with no skill,
    a documented step that drifted from the code, a missing test, a
    `Known gaps` item. Each gap is a candidate work item.
@@ -37,8 +42,9 @@ Rules:
 - Plan only. No edits to skills, tools, or docs. No commits in this beat.
 - Keep the plan prioritised and concise; one work item = one branch = one
   PR.
-- Do not duplicate in-flight work from open PRs. If a stale existing plan
-  item is now covered by an open PR, remove it or mark it as in-flight
-  rather than leaving it available for the build beat.
+- Do not duplicate in-flight work. If a stale existing plan item is now
+  covered by an open PR or already built on a local work-item branch, remove
+  it or mark it as in-flight rather than leaving it available for the build
+  beat.
 - Treat `tools/` as the standard library — prefer extending an existing
   tool over a new ad-hoc one.
diff --git a/tools/spec-loop/loop.sh b/tools/spec-loop/loop.sh
index 8e97a8c..ad96164 100755
--- a/tools/spec-loop/loop.sh
+++ b/tools/spec-loop/loop.sh
@@ -21,6 +21,12 @@
 #     .claude/settings.json `ask` — they are the human's step. The loop
 #     ends at a local commit and the build prompt prints the human-run
 #     push + `gh pr create --web` commands.
+#   * NO REDOING BUILT WORK: because the loop never pushes, a work item it
+#     already built exists only as a LOCAL BRANCH and has no open PR. Each
+#     iteration therefore feeds the agent BOTH the open PRs and the local
+#     work-item branches as in-flight work. Without the local-branch signal
+#     the agent would re-pick the same top-priority plan item every
+#     iteration and rebuild it forever (an endless loop).
 #
 # SECURITY — read before running:
 #   This loop runs the agent with `--dangerously-skip-permissions`, which
@@ -175,6 +181,56 @@ open_pr_context() {
     fi
 }
 
+# Local work-item branches the loop has already built. This is the companion
+# to open_pr_context: the loop never pushes, so a freshly built item has NO
+# open PR and is invisible to the PR check above. Listing it here as in-flight
+# work is what stops the agent re-picking the same top-priority plan item and
+# rebuilding it on a new branch every iteration. Reads refs only, so it is
+# correct regardless of which branch is currently checked out.
+local_branch_context() {
+    echo ""
+    echo "## Local work-item branches"
+    echo ""
+    echo "The runner collected this immediately before the iteration. This 
loop"
+    echo "never pushes and never opens a PR, so a work item it has already 
built"
+    echo "exists ONLY as a local branch and will NOT appear in the open-PR"
+    echo "context above. Treat every branch listed here as work that is 
already"
+    echo "built or in flight: do not add a plan item, and do not pick a build"
+    echo "item, whose slug matches one of these branches or whose change one 
of"
+    echo "them already carries. Checking these branches (not just open PRs) is"
+    echo "what keeps the loop from rebuilding the same item every iteration."
+    echo ""
+
+    local have_base=false
+    if git rev-parse --verify --quiet "refs/heads/$BASE" >/dev/null 2>&1; then
+        have_base=true
+    fi
+
+    # Every local branch except the integration base and the control branch
+    # (where the tooling lives); those two are never work-item branches.
+    local branches
+    branches="$(git for-each-ref --format='%(refname:short)' refs/heads/ \
+        | grep -vxF "$BASE" \
+        | grep -vxF "$TOOLING_REF")"
+
+    if [ -z "$branches" ]; then
+        echo "- No local work-item branches found."
+        return 0
+    fi
+
+    local b subject ahead
+    while IFS= read -r b; do
+        [ -n "$b" ] || continue
+        subject="$(git log -1 --format='%s' "$b" 2>/dev/null)"
+        if [ "$have_base" = true ]; then
+            ahead="$(git rev-list --count "$BASE..$b" 2>/dev/null)"
+            echo "- ${b} (${ahead:-?} commit(s) ahead of ${BASE}): ${subject}"
+        else
+            echo "- ${b}: ${subject}"
+        fi
+    done <<< "$branches"
+}
+
 while true; do
     if [ "$MAX_ITERATIONS" -gt 0 ] && [ "$ITERATION" -ge "$MAX_ITERATIONS" ]; 
then
         echo "Reached max iterations: $MAX_ITERATIONS"; break
@@ -228,6 +284,7 @@ while true; do
         rm -f "$PROMPT_WITH_CONTEXT"; break
     fi
     open_pr_context >> "$PROMPT_WITH_CONTEXT"
+    local_branch_context >> "$PROMPT_WITH_CONTEXT"
 
     if [ "$BUILD_ITERATION" = true ]; then
         # The work-item branch forks off BASE (e.g. main), which need not
@@ -261,32 +318,6 @@ while true; do
             } >> "$PROMPT_WITH_CONTEXT"
         fi
 
-        # Freshness check: refuse to fork off a stale base. Fetch the base's
-        # tracking branch and verify BASE is not behind it. Forking off a stale
-        # local base is how a loop re-does work that's already merged upstream:
-        # the agent sees "no such file" locally, dutifully re-creates it, and
-        # the resulting branch collides with the merged PR's source branch on
-        # the remote.
-        BASE_UPSTREAM="$(git rev-parse --abbrev-ref "${BASE}@{upstream}" 
2>/dev/null || true)"
-        if [ -n "$BASE_UPSTREAM" ]; then
-            BASE_REMOTE="${BASE_UPSTREAM%%/*}"
-            if ! git fetch --quiet "$BASE_REMOTE" "$BASE" 2>/dev/null; then
-                echo "⚠ Could not fetch '$BASE_REMOTE' — freshness check 
skipped (network or auth issue)." >&2
-            else
-                BEHIND_BY="$(git rev-list --count "${BASE}..${BASE_UPSTREAM}" 
2>/dev/null || echo 0)"
-                if [ "$BEHIND_BY" -gt 0 ]; then
-                    echo "✗ Base '$BASE' is $BEHIND_BY commit(s) behind 
'$BASE_UPSTREAM'." >&2
-                    echo "  Fast-forward before re-running:" >&2
-                    echo "    git checkout $BASE && git merge --ff-only 
$BASE_UPSTREAM" >&2
-                    echo "  Forking off a stale base re-does merged work — the 
new branch" >&2
-                    echo "  may collide with one already on the remote for the 
same change." >&2
-                    rm -f "$PROMPT_WITH_CONTEXT"; break
-                fi
-            fi
-        else
-            echo "⚠ Base '$BASE' has no upstream tracking branch — freshness 
check skipped." >&2
-        fi
-
         # Check out the base now — right before the agent runs, not earlier —
         # so the reads above came from the control branch. The agent then
         # forks its own <slug> branch off this base.
@@ -330,24 +361,6 @@ while true; do
         # Report the work-item branch the agent produced, by name, so you know
         # exactly what to push.
         if [ "$CUR_BRANCH" != "$BASE" ] && [ "$CUR_BRANCH" != "$TOOLING_REF" 
]; then
-            # Branch-name collision check: a remote branch with the same name
-            # often means the agent just re-did work that already shipped under
-            # this slug (a merged PR's source branch typically lingers on the
-            # remote). Warn loudly — pushing would either be rejected or, 
worse,
-            # overwrite the merged history. Check every configured remote, not
-            # just origin: a fork-based workflow has the lineage on `upstream`.
-            COLLISION_FOUND=false
-            for remote in $(git remote); do
-                if git ls-remote --heads --exit-code "$remote" "$CUR_BRANCH" 
>/dev/null 2>&1; then
-                    echo "⚠ Remote '$remote' already has a branch named 
'$CUR_BRANCH'." >&2
-                    COLLISION_FOUND=true
-                fi
-            done
-            if [ "$COLLISION_FOUND" = true ]; then
-                echo "  Likely the source branch of a PR that already shipped 
under this slug." >&2
-                echo "  Inspect before pushing — do not push blind:" >&2
-                echo "    git fetch --all && git log --oneline --all -- 
<changed-file>" >&2
-            fi
             echo "[ new branch ] $CUR_BRANCH  (forked off $BASE)"
             echo "               push it with:  git push -u origin $CUR_BRANCH"
         else

Reply via email to