(airflow-steward) branch main updated: feat(agent-isolation): claude-term-bg.sh — distinguish blocked-on-decision from finished-idle (#503)

potiuk Thu, 11 Jun 2026 15:14:01 -0700

This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git



The following commit(s) were added to refs/heads/main by this push:
     new c51ad1f6 feat(agent-isolation): claude-term-bg.sh — distinguish 
blocked-on-decision from finished-idle (#503)
c51ad1f6 is described below

commit c51ad1f608dae7a32bd84cf26fea1f01667bc844
Author: Jarek Potiuk <[email protected]>
AuthorDate: Fri Jun 12 00:13:46 2026 +0200

    feat(agent-isolation): claude-term-bg.sh — distinguish blocked-on-decision 
from finished-idle (#503)
    
    The terminal-tint helper previously tinted on every `Stop` event, so a
    turn that merely *finished* (Claude idle until you start the next thing)
    looked identical to a turn genuinely blocked on your decision. This
    reworks the state machine to tint only when Claude actually wants you to
    act, across six hooks:
    
      Stop          -> stop    (heuristic: tint only if the final assistant
                                message reads as a question/request; a
                                completion stays calm. Needs python3/python,
                                else defaults calm)
      PreToolUse     -> wait    (matcher AskUserQuestion — exact signal)
      PostToolUse    -> reset   (matcher * — calm while working; clears the
                                tint the instant you act on a prompt)
      Notification   -> notify  (tint permission prompts only; idle ping no-op)
      UserPromptSubmit/SessionStart -> reset
    
    PostToolUse (not PreToolUse) now carries the general reset so it can clear
    a tint the permission prompt itself set; the idle ping is a deliberate
    no-op so it can't wipe a pending question's tint after ~60s.
    
    Docs updated in lockstep: the wiring table, install JSON, and install-skill
    checklist in docs/setup/secure-agent-setup.md, plus the README table row.
    
    Generated-by: Claude Code (Opus 4.8)
---
 docs/setup/secure-agent-setup.md        |  75 ++++++++++++-------
 tools/agent-isolation/README.md         |   2 +-
 tools/agent-isolation/claude-term-bg.sh | 124 +++++++++++++++++++++++++++++---
 3 files changed, 166 insertions(+), 35 deletions(-)

diff --git a/docs/setup/secure-agent-setup.md b/docs/setup/secure-agent-setup.md
index f5277c2c..745db14c 100644
--- a/docs/setup/secure-agent-setup.md
+++ b/docs/setup/secure-agent-setup.md
@@ -1305,23 +1305,39 @@ elsewhere. The framework ships
 
[`tools/agent-isolation/claude-term-bg.sh`](../../tools/agent-isolation/claude-term-bg.sh)
 to make a **calm baseline the normal state and tint the background
 only when Claude genuinely wants you to act** — never while it is
-working. The model is two states, wired across five hooks:
+working, and never when it merely *finished* a turn and is idle
+until you start the next thing. Those last two look identical at the
+`Stop` event, so the model leans on three "Claude is asking you for
+something" signals (two exact, one heuristic), wired across six
+hooks:
 
 | Moment | Hook → action | Background |
 |---|---|---|
-| Turn finished — your turn to respond | `Stop` → `wait` | tinted (muted 
indigo `#2a1a3a`) |
+| Turn ended on a genuine question/request | `Stop` → `stop` | tinted (muted 
indigo `#2a1a3a`) |
+| Turn ended on a completion ("Done.") | `Stop` → `stop` | calm |
+| Structured question posed | `PreToolUse` (matcher `AskUserQuestion`) → 
`wait` | tinted |
 | Blocked on a permission prompt | `Notification` → `notify` | tinted |
-| Actively working (running a tool) | `PreToolUse` → `reset` | calm |
-| Fresh/idle session, plain idle ping | `SessionStart` / `Notification` → 
`reset`/`notify` | calm |
-| You submit a reply | `UserPromptSubmit` → `reset` | calm |
-
-Three details make the model behave:
-
-- **`PreToolUse` → `reset`** fires on every tool call, so the moment
-  Claude resumes work — including right after you approve a
-  permission prompt that had tinted the screen — it returns to calm.
-  Without it, an approved-permission tint would linger until the
-  turn ended.
+| Actively working / you just acted | `PostToolUse` (matcher `*`) → `reset` | 
calm |
+| Plain 60-second idle ping | `Notification` → `notify` (no-op) | unchanged |
+| Fresh session, or you submit a reply | `SessionStart` / `UserPromptSubmit` → 
`reset` | calm |
+
+Four details make the model behave:
+
+- **`Stop` → `stop`** is the only non-exact signal. It reads the
+  last assistant text message from the session transcript (the path
+  arrives on stdin in the `Stop` payload) and tints only when that
+  message ends as a question (`…?`) or with a strong trailing
+  request ("want me to", "would you like", "should I", "OK to",
+  "your call", …); a statement-shaped completion stays calm. Needs
+  `python3`/`python` on `PATH`; if absent, `stop` defaults to calm
+  and only the two exact signals tint.
+- **`PostToolUse` → `reset`** (not `PreToolUse`) clears the "you
+  just acted" tint. `PreToolUse` fires *before* the permission
+  prompt is shown, so it cannot clear a tint the prompt itself
+  sets; `PostToolUse` fires *after* the tool completes — the moment
+  your approval lets work resume — so it is what returns the screen
+  to calm. `PreToolUse` is reserved for the `AskUserQuestion` →
+  `wait` tint.
 - **`SessionStart` → `reset`** clears any tint a *previous* session
   left behind (OSC background changes persist in the terminal
   across processes, so a session closed mid-wait would otherwise
@@ -1330,8 +1346,10 @@ Three details make the model behave:
 - **`Notification` → `notify`** is selective: the same hook fires
   both for permission prompts *and* the plain 60-second idle ping,
   so the script reads the notification payload on stdin and tints
-  only when the message is a permission/attention prompt; an idle
-  session stays calm.
+  only for a permission/attention prompt. The idle ping is a
+  deliberate **no-op** (not a reset) — otherwise a turn that ended
+  on a genuine question, tinted by `stop`, would silently go calm
+  after a minute.
 
 **Two mechanics make this work** (both are easy to get wrong):
 
@@ -1364,21 +1382,26 @@ cp 
/path/to/airflow-steward/tools/agent-isolation/claude-term-bg.sh \
 chmod +x ~/.claude/scripts/claude-term-bg.sh
 ```
 
-Wire it into `~/.claude/settings.json` under four hook events. If
+Wire it into `~/.claude/settings.json` under six hook events. If
 you already have hooks on any of these events, add the command as
 an extra entry rather than replacing the existing array. The
-`CLAUDE_RESET_BG=#000000` prefix makes the calm state a
-deterministic black (recommended — it sidesteps the OSC-111 reset
-gap described above); drop it to fall back to profile-default
-reset.
+`PreToolUse` entry uses the `AskUserQuestion` matcher (not `*`), so
+it tints only on a structured question; `PostToolUse` carries the
+general `reset`. The `CLAUDE_RESET_BG=#000000` prefix makes the
+calm state a deterministic black (recommended — it sidesteps the
+OSC-111 reset gap described above); drop it to fall back to
+profile-default reset.
 
 ```jsonc
 {
   "hooks": {
     "Stop": [
-      { "hooks": [ { "type": "command", "command": 
"~/.claude/scripts/claude-term-bg.sh wait" } ] }
+      { "hooks": [ { "type": "command", "command": "CLAUDE_RESET_BG=#000000 
~/.claude/scripts/claude-term-bg.sh stop" } ] }
     ],
     "PreToolUse": [
+      { "matcher": "AskUserQuestion", "hooks": [ { "type": "command", 
"command": "~/.claude/scripts/claude-term-bg.sh wait" } ] }
+    ],
+    "PostToolUse": [
       { "matcher": "*", "hooks": [ { "type": "command", "command": 
"CLAUDE_RESET_BG=#000000 ~/.claude/scripts/claude-term-bg.sh reset" } ] }
     ],
     "UserPromptSubmit": [
@@ -1817,11 +1840,13 @@ Then walk through:
    on me (a pure quality-of-life signal, no security effect).
    **Default no.** Only if I say yes: copy
    `<airflow-steward>/tools/agent-isolation/claude-term-bg.sh`
-   into `~/.claude/scripts/` and `chmod +x` it, then add five
+   into `~/.claude/scripts/` and `chmod +x` it, then add six
    hooks to `~/.claude/settings.json`, merging into any existing
-   arrays on those events — `Stop` → `claude-term-bg.sh wait`;
-   `UserPromptSubmit`, `SessionStart`, and `PreToolUse` (matcher
-   `*`) → `claude-term-bg.sh reset`; and `Notification` →
+   arrays on those events — `Stop` → `claude-term-bg.sh stop`
+   (heuristic tint on a question/request, calm on a completion);
+   `PreToolUse` (matcher `AskUserQuestion`) → `claude-term-bg.sh
+   wait`; `UserPromptSubmit`, `SessionStart`, and `PostToolUse`
+   (matcher `*`) → `claude-term-bg.sh reset`; and `Notification` →
    `claude-term-bg.sh notify`. Ask whether I want the calm state
    to be a deterministic black (prefix the reset/notify commands
    with `CLAUDE_RESET_BG=#000000`) or the terminal's profile
diff --git a/tools/agent-isolation/README.md b/tools/agent-isolation/README.md
index 228f915f..8d44b694 100644
--- a/tools/agent-isolation/README.md
+++ b/tools/agent-isolation/README.md
@@ -34,7 +34,7 @@ versions.
 | [`sandbox-error-hint.sh`](sandbox-error-hint.sh) | Claude Code `PostToolUse` 
hook (Bash matcher). Scans the tool's stdout + stderr for the three known 
sandbox-shaped error signatures (SSH agent / Yubikey unreachable, loopback 
port-bind blocked, docker / podman socket denied) and prints a `[sandbox-hint]` 
line pointing at the matching entry in 
[`docs/setup/sandbox-troubleshooting.md`](../../docs/setup/sandbox-troubleshooting.md).
 Fail-open: any unexpected JSON shape exits silent. Recomm [...]
 | [`sandbox-status-line.sh`](sandbox-status-line.sh) | Claude Code 
`statusLine` helper. Renders `<model> [sandbox]` (green) or `<model> [NO 
SANDBOX]` (bold red) based on `sandbox.enabled` in the active settings — 
project `settings.local.json` first, then project `settings.json`, then 
user-scope, mirroring Claude Code's own precedence. Reflects in-session 
`/sandbox` toggles (which persist to project `settings.local.json`). 
Recommended user-scope. |
 | [`sandbox-status-line-rich.sh`](sandbox-status-line-rich.sh) | Opt-in richer 
alternative to `sandbox-status-line.sh`. Same sandbox-state detection, plus 
folder name (hash-coloured), git branch + dirty + ahead/behind, per-branch PR 
title (cached, gated by `gh`), and a yellow `[sandbox-auto]` tag for the 
`autoAllowBashIfSandboxed` setting. Wire one *or* the other into 
`statusLine.command`. |
-| [`claude-term-bg.sh`](claude-term-bg.sh) | **Opt-in quality-of-life helper 
(not a security control).** Keeps a calm baseline background and tints it only 
when Claude genuinely wants you to act (never while working), so a window 
you've tabbed away from can't sit blocked unnoticed. Two states across five 
hooks: `Stop` → `wait` (turn finished, tint); `UserPromptSubmit` + 
`SessionStart` + `PreToolUse` → `reset` (calm — `PreToolUse` keeps it calm 
while a tool runs and clears an approved-per [...]
+| [`claude-term-bg.sh`](claude-term-bg.sh) | **Opt-in quality-of-life helper 
(not a security control).** Keeps a calm baseline background and tints it only 
when Claude genuinely wants you to act (never while working, and never when it 
merely *finished* a turn), so a window you've tabbed away from can't sit 
blocked unnoticed. Distinguishes "blocked on a decision" from "finished and 
idle" — which look identical at the `Stop` event — via three signals across six 
hooks: `Stop` → `stop` (heur [...]
 | [`sandbox-add-project-root.sh`](sandbox-add-project-root.sh) | Adds the 
current adopter repo's project root (and, with `--all-worktrees`, every linked 
git worktree's working dir) as an explicit absolute path to 
`sandbox.filesystem.allowRead` and `allowWrite` in the project-local, 
gitignored `<repo>/.claude/settings.local.json` — one entry per worktree, each 
in that worktree's own settings file. Defensive against [issue 
#197](https://github.com/apache/airflow-steward/issues/197) — `allo [...]
 | [`git-global-post-checkout.sh`](git-global-post-checkout.sh) | Universal 
`post-checkout` git hook installed at `~/.claude/git-hooks/post-checkout` when 
the operator picks **whole-user** scope in `setup-isolated-setup-install`. 
Activated by `git config --global core.hooksPath ~/.claude/git-hooks/` so every 
`git checkout` / `git clone` / `git worktree add` across the host invokes it. 
Two responsibilities (both best-effort + idempotent + `\|\| true`): (1) `setup 
verify --auto-fix-symlinks [...]
 
diff --git a/tools/agent-isolation/claude-term-bg.sh 
b/tools/agent-isolation/claude-term-bg.sh
index d8f730f1..0ef1d89d 100755
--- a/tools/agent-isolation/claude-term-bg.sh
+++ b/tools/agent-isolation/claude-term-bg.sh
@@ -20,16 +20,52 @@
 # waiting on YOU, and keep it calm the rest of the time.
 #
 # This is a quality-of-life helper, NOT a security control: it makes the
-# "Claude is blocked on me" state impossible to miss in a window you've
-# tabbed away from. Wire it into five Claude Code hooks (user-scope):
+# "Claude is BLOCKED ON A DECISION I have to make" state impossible to miss in
+# a window you've tabbed away from — while deliberately NOT tinting when Claude
+# merely finished a task and is idle until you start the next thing. Those two
+# look identical at the `Stop` event, so the script distinguishes them with
+# three genuine "Claude is asking you for something" signals (two exact, one
+# heuristic) and stays calm for everything else. Wire it into six hooks
+# (user-scope):
 #
-#   "Stop"              -> claude-term-bg.sh wait    (turn finished — your 
turn, tint)
+#   "Stop"              -> claude-term-bg.sh stop    (turn ended: TINT only if 
the final
+#                                                     assistant message is 
genuinely a
+#                                                     question/request; stay 
calm if it reads
+#                                                     as a completion — 
heuristic, see below)
+#   "PreToolUse"        -> claude-term-bg.sh wait    (matcher: AskUserQuestion 
— a structured
+#                                                     question was posed; 
EXACT signal)
+#   "PostToolUse"       -> claude-term-bg.sh reset   (matcher: * — a tool 
finished: calm while
+#                                                     working, and clears the 
tint the instant
+#                                                     you approve a permission 
prompt or answer
+#                                                     an AskUserQuestion)
+#   "Notification"      -> claude-term-bg.sh notify  (TINT for permission 
prompts only — EXACT;
+#                                                     the plain 60s idle ping 
is a NO-OP so it
+#                                                     can't wipe a pending 
question's tint)
 #   "UserPromptSubmit"  -> claude-term-bg.sh reset   (you replied — back to 
calm)
 #   "SessionStart"      -> claude-term-bg.sh reset   (fresh session — clear 
any stale tint)
-#   "PreToolUse"        -> claude-term-bg.sh reset   (actively working — stay 
calm; also
-#                                                     clears an 
approved-permission tint)
-#   "Notification"      -> claude-term-bg.sh notify  (tint for permission 
prompts only;
-#                                                     the plain idle ping 
stays calm)
+#
+# Two design notes that make the distinction hold:
+#
+#   * Why PostToolUse (not PreToolUse) clears the "you just acted" tint:
+#     PreToolUse fires BEFORE the permission prompt is shown, so it cannot 
clear
+#     a tint the prompt itself sets. PostToolUse fires AFTER the tool completes
+#     — the moment your approval/selection lets work resume — so it is the hook
+#     that returns the screen to calm once you have acted. (This is also why 
the
+#     general PreToolUse reset is gone: PreToolUse is now reserved for the
+#     AskUserQuestion->wait tint, and PostToolUse alone keeps work calm.)
+#   * Why the idle ping is a no-op (not a reset): the `Notification` hook also
+#     fires the plain "waiting for your input" ping ~60s after a turn ends. If
+#     that reset the background, a turn that ended on a genuine question 
(tinted
+#     by `stop`) would silently go calm after a minute. Leaving it untouched
+#     preserves whatever state `stop`/`notify` decided.
+#
+# The `stop` heuristic (the only non-exact signal): it reads the LAST assistant
+# text message from the session transcript (path arrives on stdin in the Stop
+# hook payload) and tints when that message ends as a question ("...?") or with
+# a strong trailing request ("want me to", "would you like", "should I", "OK
+# to", "your call", ...). A statement-shaped completion ("Done.", "All set.")
+# stays calm. Needs python3/python on PATH; if absent, `stop` defaults to calm
+# (only the two EXACT signals tint).
 #
 # Colours are overridable via the environment (e.g. inline in the hook 
command):
 #   CLAUDE_WAIT_BG    background while waiting   (default: #2a1a3a, a muted 
indigo)
@@ -84,16 +120,86 @@ set_reset() {
   fi
 }
 
+# Classify the final assistant message of a transcript as a genuine
+# question/request ("wait") vs a completion ("calm"). Prints one word.
+# Heuristic, anchored on the trailing sentence so a long completion that
+# merely *mentions* a question earlier still reads as calm.
+classify_last_message() {
+  local transcript="$1" py
+  py=$(command -v python3 2>/dev/null || command -v python 2>/dev/null || true)
+  [ -n "$py" ] && [ -f "$transcript" ] || { echo calm; return; }
+  "$py" - "$transcript" <<'PY' 2>/dev/null || echo calm
+import json, re, sys
+last = ""
+try:
+    with open(sys.argv[1], encoding="utf-8", errors="replace") as f:
+        for line in f:
+            line = line.strip()
+            if not line:
+                continue
+            try:
+                obj = json.loads(line)
+            except Exception:
+                continue
+            if not isinstance(obj, dict):
+                continue
+            msg = obj.get("message") if isinstance(obj.get("message"), dict) 
else {}
+            role = msg.get("role") or (obj.get("type") if obj.get("type") == 
"assistant" else None)
+            if role != "assistant":
+                continue
+            content = msg.get("content", obj.get("content"))
+            txt = ""
+            if isinstance(content, list):
+                for b in content:
+                    if isinstance(b, dict) and b.get("type") == "text":
+                        txt += b.get("text", "")
+            elif isinstance(content, str):
+                txt = content
+            if txt.strip():
+                last = txt
+except Exception:
+    print("calm"); sys.exit(0)
+
+t = re.sub(r'[`*_>#\s]+$', '', last.strip())   # drop trailing markdown/space
+if not t:
+    print("calm"); sys.exit(0)
+tail = t.splitlines()[-1].strip()
+blob = t.lower()[-240:]
+ask = tail.endswith("?") or any(p in blob for p in (
+    "want me to", "would you like", "should i ", "shall i ", "do you want",
+    "ok to ", "okay to ", "your call", "which would you", "let me know which",
+    "let me know if you'd like", "want that too", "proceed with",
+))
+print("wait" if ask else "calm")
+PY
+}
+
 case "$action" in
   wait|set)  set_wait ;;
   reset|off) set_reset ;;
+  stop)
+    # Turn ended. Tint ONLY if the final assistant message is genuinely a
+    # question/request; a completion-shaped message stays calm. This is what
+    # separates "Claude is asking me something" from "Claude finished — I'll
+    # start the next thing whenever." Payload (with transcript_path) on stdin.
+    payload=$(cat 2>/dev/null)
+    transcript=$(printf '%s' "$payload" \
+      | sed -n 
's/.*"transcript_path"[[:space:]]*:[[:space:]]*"\([^"]*\)".*/\1/p' \
+      | head -n1)
+    case "$transcript" in "~/"*) transcript="$HOME/${transcript#\~/}" ;; esac
+    case "$(classify_last_message "$transcript")" in
+      wait) set_wait ;;
+      *)    set_reset ;;
+    esac
+    ;;
   notify)
     # Notification fires for permission prompts AND the plain 60s idle ping.
-    # Tint only when it's something that genuinely wants the user to act.
+    # Tint for a genuine permission request; leave the background UNTOUCHED on
+    # the idle ping so it can't wipe a question-tint set by `stop`.
     msg=$(cat 2>/dev/null)
     case "$msg" in
       *permission*|*approve*|*"needs your"*) set_wait ;;
-      *)                                     set_reset ;;
+      *)                                     : ;;   # idle ping / other — no-op
     esac
     ;;
 esac

(airflow-steward) branch main updated: feat(agent-isolation): claude-term-bg.sh — distinguish blocked-on-decision from finished-idle (#503)

Reply via email to